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SUMMARY 

A  new  method  of  probability  density  estimation  is  investigated 

which  exploits  the  Fourier  series  representation  of  a  density  function. 

.V 

The  new  method  employs  density  estimators  f  (•)»  P  ■  0,1,2,...  and 

-  (p»q> 

q  «  0,1,2,.,.,  which  are  such  that  is  a  Fourier  series  (Kronmal- 

Tarter  type)  estimator  and  f  n(.') '  is  an  autoregressive  estimator.  Each 

of  the  estimators  f  (•)' (referred  to  as  ASMA  estimators)  is  shown  to 

(PiqJ 

depend  upon  the  en-transform,  thus  providing  a  strong  motivation  for 

v_ 

the  use  of  estimators  with  both  p  >  0  and  q  >  0.  Small  and  large 
sample  properties  of  ASMA  density  estimators  are  obtained  and  a 
data-based  method  of  selecting  optimal  values  of  p  and  q  is  proposed. 

The  results  of  a  simulation  study  show  that,  for  the  densities  con¬ 
sidered,  a  savings  in  integrated  square  error  is  attained  by  using 
ARMA,  rather  than  Fourier  series,  density  estimation. 

\ 
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CHAPTER  I 


INTRODUCTION 


1.1  Introduction 

_  _  _  The  purpose  of  this  work  is  to  investigate 

a  method  of  probability  density  estimation  which  is  based  upon  what 
will  be  called  the  ARMA  method  of  approximating  a  function.  The 


ARMA  method  employs  representations  of  the  form 


f_  a0c) 

p.q 


Z  8^eilat 

k-q 


1 1  ix  ipx  1 2 

(l-a^e  -  ...  -  a  •  ^  | 


(0. 


k- V 


(1,1) 


to  approximate  the  real-valued  function  f(*)  over  the  interval 
[-ir, w].  The  acronym  ARMA  is  used  because  of  the  fact  that,  if 
fp  q(*)  is  nonnegative,  its  numerator  may  be  expressed  as 

k| l-Q^e1*  -  ...  -  0qeiqX|2  for  all  x  e  [~rr,ir]  . 

Expressed  in  this  way,  f  (•)  is  seen  to  have  a  form  equivalent  to 

p.q 

the  spectrum  of  an  autoregressive,  moving  average  (ARMA)  process. 

Because  of  the  wide  applicability  of  the  ARMA  model  in  time 
series,  representations  such  as  (1.1)  have  a  very  natural  motivation 
in  spectral  estimation.  The  motivation,  to  be  developed  fully  in 
succeeding  chapters,  for  their  use  in  probability  density  estimation 
must  obviously  be  somewhat  different.  ’For  the  present  we  simply  point 
out  that  the  relationship  of  f  (•)  to  a  numerical  analysis  tool 

p.q 

known  as  the  e  -transform  implies  that  ARMA  representations  are 

ti 

attractive  as  an  approximation  scheme.  Their  value  as  an  approxi¬ 
mation  scheme  in  turn  suggests  their  possible  value  in  the  estimation 
setting. 
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In  Chapter  II  definitions  of  the  constants  6^(k*0,l, . . . ,q) 

and  will  be  given  which,  for  a  given  function  f(*)» 

uniquely  define  an  approximator  f  (•)  for  each  pair  of  values 

P»*l 

(p,q).  The  approximator  so  defined  depends  only  upon  the  Fourier 
coefficients  $(0),  $(1) . $(p+q) ,  where 

it  . 

<fr(v)  -  /  e"lvxf (x)dx,  | v |  -  0,1,2,... 

-w 

(note  that  $(-v)  »  $(v)).  Thus  if  f(*)  is  the  probability  density 

function  of  a  random  variable  with  support  [— ir , -rr ] ,  estimators 

f  (•)  of  f(*)  can  be  formed  by  estimating  the  Fourier  coefficients 
P»*l 

of  f(*)  . 

In  light  of  the  many  existing  techniques  of  density  estima¬ 
tion,  one  might  reasonably  question  the  consideration  of  the  class 
of  estimators  just  described.  In  order  to  be  of  more  than  simply 
academic  interest,  a  new  technique  should  either  have  the  potential 
for  Improvement  over,  or  shed  some  informative  light  on  existing 
techniques.  Hopefully,  it  will  be  shown  that  the  method  of  density 
estimation  being  proposed  satisfies  both  of  these  requirements  with 
respect  to 

(1)  Fourier  series  density  estimators,  and 
(il)  autoregressive  density  estimators. 

It  will  be  seen  shortly  that  these  two  classes  of  estimators 
are  members  of  the  general  class  of  ABMA  density  estimators.  Before 
embarking  on  an  investigation  of  AKMA  estimators,  it  will  thus  be 
expedient  to  briefly  discuss  the  origin  and  properties  of  Fourier 
series  and  autoregressive  density  estimators. 
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1.2  Fourier  Series  Density  Estimation 

Cencov  (1962)  first  suggested  the  use  of  Fourier  series 
ideas  la  the  estimation  of  a  probability  density  function.  Let 
Lj^)  be  a  Hilbert  space  vhose  inner  product  is  defined  by 

Oft 

($,*)  *  /  $(x)iKx)r(x)dx, 


where  r  is  a  weight  function.  Let  f(*)  be  the  density  of  a  random 

variable  X  and  assume  that  f(*)  e  L„(r) .  Now,  suppose  E  is  an 

i  m 

arbitrary  m-dimensional  subspace  of  L^(r)  with  orthonormal  basis 
{g^ . £  }.  The  best  mean  square  error  approximation  of  f(x) 


in  E  is 


fm(x)  “  Wk*(x)  * 


*km  "  (5km’f)  "  f\m(x)r(x)dF(x)* 


If  a  random  sample  X^,  •••»xn  Is  obtained  from  f(*>,  then  Cencov 
suggests  estimating  f(x)  by 


fm(x)  "  .S1*km5ta(x>» 
k-1 

where 

A  2 

Cencov  points  out  that  E C I ( f  (x)-f(x)|[  I  can  be  made  arbitrarily 

SI 

small  by  choosing  a  sufficiently  good  approximating  subspace  E^ 

and  then  taking  a  large  enough  number  n  of  observations. 

Kronmal  and  Tarter  (1968)  hove  investigated  a  special  case 

2 

of  the  above  by  considering  the  weight  function  r(x)  »  *[a  b]^ 

and  the  orthonormal  system 


j-~-  .  “3  »(£5)  . cos  o*(Si)  |  • 

Based  on  this  system  an  estimator  of  f(x)  (x  e  [a,b])  is 


«  - 


where 


y*>  ■  “f  +  k‘j V0,t'  (Si) 


.  ,  n  knCX^-a) 

*k"  m;  '  S*r_I[«fb](V 


It  can  be  shown  that 


cov($  ,6  ) 
j  k 


l[&  <*h 


;+  *j+k>  -  *iV  0  i k) 


where 


♦k-  PT*  £(,° 


This  leads  to  a  simple  expression  for  the  mean  integrated  square 


error  (MISE)  of  f  (•)»  namely 
m 


(x)  -  f(x))2r(x)dx 


b_/  2 


(&  -  ♦•) 


1  r  (K  +  *2k  ,2\ 

°  k-1  \  b’a  k/ 


+  I 
k*nrH 


k 


Making  use  of  these  results  Kronmal  and  Tarter  prove  the  following 
theorem. 


Theorem  1.1  If  the  Fourier  cosine  series  of  the  density  f (•) 
converges  uniformly  and  if  m  ■  o(vn) ,  then 

lim  E(f  (x)  -  f(x))2  -  0  (uniformly  in  xe[a,b]) 

m 


lim  E  /( f  (x)  -  f(x))Zdx  -  0 
n 
a 


i 


The  importance  of  this  theorem  is  its  establishment  of  the  rate 
at  which  the  truncation  point  m  may  increase  with  the  sample  size 

a 

in  order  for  f  (•)  to  be  a  consistent  estimator  of  f(*).  In 
m 

addition  to  this  asymptotic  result,  Kronmal  and  Tarter  devise  a 
procedure  for  choosing  an  m  which,  for  a  given  sample  size, 
minimizes  the  MXSE. 

Approaches  for  estimating  f(*)  using  different  orthogonal 
systems  of  functions  have  also  been  considered.  For  example, 
Schwartz  (1967)  has  investigated  the  use  of  Hermite  polynomials. 

In  the  present  study,  however,  our  principal  interest  will  be  in 
the  trigonometric  systems  because  of  their  close  association  with 
ARMA  approximators  and  estimators. 

1.3  Autoregressive  Density  Estimation 

Carmichael  (1976)  has  adapted  the  idea  of  autoregressive 

spectral  estimation  to  the  estimation  of  a  probability  density.  In 

order  to  briefly  outline  Carmichael's  method,  let  f(*)  be  the  pdf 

of  a  random  variable  X  with  support  Define  R(*)  by 

if  , 

R(v)  ■  /  e  VXf(x)dx  ,  |v|  -  0,1,2,...  . 

-rr 

Let  (a,  ,a.  , ...,a  )  be  defined  as  the  solution  (assumed  unique) 

im  zm  mm 

of  the  following  system  of  Yule-Walker  equations: 


"l  R(-l)  ...  R(-m+l)* 

3lm 

*R(1)  ’ 

R(l)  1  ...  R(-m+-2) 

3  2m 

m 

R(2) 

•  •  • 

e  •  • 

•  •  • 

.  R(m-l)  R(m-2)  ...  1 

• 

• 

• 

a 

—  mu  ■ 

• 

• 

. . 

tiHfi'ii  r-iMammMsmib  r  ri 


m 
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where  1c  is  chosen  so  that  R(0)  *  1. 
m 

The  term  approximator  is  appropriate  since  it  can  be  shown 


that 

ir  _ . 

/  e  lvXf  (x)dx  -  R(v),  | v |  -  0,1,..., m  . 

td 

-TT 


When  observations  are  available  from  f(*),  an  estimator  f  (*)  can 

m 

be  similarly  obtained  by  first  estimating  R(*). 

Carmichael  provides  two  motivations  for  the  approach  just 
outlined.  One  motivation  involves  regarding  {R(v):|v|  *  0,1,...} 
as  the  correlation  sequence  of  a  complex-valued,  stationary  time 
series.  The  spectral  density  f(*)  of  this  hypothetical  time  series 
is  approximated  by  the  mth  order  autoregressive  scheme  f  ^  ( • )  . 
Another  motivation  follows  from  showing  the  equivalence  of  fffl(*) 
to  an  approximator  formed  by  constructing  a  set  of  polynomials  in 
e*x  which  are  orthogonal  with  respect  to  the  inner  product 

(g,h)  -  /  g( e1*)  h(eix)  f(x)dx. 

-IT 


The  weak  consistency  of  f  (•)  as  an  estimator  of  f(»)  has 

m 

also  been  established  by  Carmichael.  This  result  may  be  stated 
as  follows.  Let  X^ . be  a  random  sample  from  f(*>  and 

“  .  1  ®  -ivX, 

R(v)  -  £  S  e  J  . 

°  J-l 

A  A 

Then  f  (•)  is  formed  by  replacing  R(*)  by  R(0  in  the  system  of 

IB 
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equations  presented  previously.  If  f(*)  satisfies  certain 
regularity  conditions  and 
3/2 

11m  ■  0  ,  then 

nr*®  /n” 

nr*® 

|fffl(x)  -  f^Cx)  |  ^  0  uniformly  in  x,  where  fjo(x)*f(x)  a.e.  [-ir,ir] . 

Parzen  (1979)  proposes  an  additional  application  of  auto¬ 
regressive  representations  in  the  estimation  of  density-quantile, 
or  fQ,  functions,  where  f(*)  and  Q(*)  are  respectively,  the  pro¬ 
bability  density  and  quantile  function  of  a  random  variable  X  and 
fQ(u)  ■  f(Q(u)),  0  _<  u  _<  1  . 

Although  density-quantile  estimation  will  not  be  investigated  in 
this  work,  the  ARMA  method  is  easily  adapted  to  this  problem.  It 
is  hoped  that  some  of  the  forthcoming  observations  pertaining  to 
density  estimation  will  find  applications  in  the  estimation  of 
fQ  and  other  types  of  functions,  such  as  hazard  functions. 


CHAPTER  II 

THE  DETERMINISTIC  SETTING:  f  (•)  AS  AN  APPROjftMATOR  OF  f(*) 

p.q 


2.1  Definitions  and  Assumptions 

In  the  current  chapter  we  will  consider  the  problem  of 
approximating  a  function  using  a  finite  number  of  its  Fourier 
coefficients.  To  facilitate  our  discussion  the  following 
definitions  and  assumptions  are  stated.  The  notation  pre¬ 
sented  here  will  be  followed  consistently  throughout  the 
remainder  of  this  work. 

(1)  f(*)  denotes  a  real-valued  function  with  domain 

of  definition  [-if, if],  which  we  wish  to  approximate  or  estimate. 

Unless  otherwise  stated,  it  shall  be  assumed  that  f(*)  Is  square 

lntegrable  on  [-if, if],  i.e.  , 
if  , 

/  r  (x)dx  <  «. 

-IT 

(11)  The  sequence  (<Kv):  |v|  ■  0,1,2,...,}  of  Fourier 
coefficients  of  f(*)  Is  defined  as 

$(v)  ■  /  e  VXf(x)dx,  | v |  -  0,1,...  . 

-IT 

Under  the  integrability  condition  in  (i)  |<Kv)|  is  finite  for 
all  v.  Note  that  if  f(*)  is  a  probability  density  function, 

$(*)  13  simply  its  characteristic  function  evaluated  at  the 
integers . 


(lit)  Unless  stated  to  the  contrary.  It  will  be  assumed  that 
£(*)  satisfies  conditions  which  ensure  that 


f(x) 


1  v  j./  \  ivx 
2^~  1  t(v)e  . 


a.e.  [-n,ir] . 


One  such  set  of  conditions  (see  Apostol  (1973))  is  that  f(*)  be 
continuous  and  of  bounded  variation  throughout  [-ir,ir]  . 


(iv)  f ( •)  will  be  said  to  have  an  ARMA  representation  iff 


f(x)  “ 


Z  @  e 

V 

-q 


ivx 


a.e.  , 


where  p  and  q  are  non-negative  integers,  Bv([v|  -  0,1, ...,q)  and 

a^(k  ■  l,...,p)  are  complex  constants  with  8  ■  "b^  ,  and  the 

roots  of  1  -  a,x-...-a  xP  ■  0  all  lie  outside  the  unit  circle. 

1  P 

2.2  Discussion  and  Definition  of  f  (•) 

_ _ _ P.q 

Before  moving  to  the  stochastic  setting,  the  ASMA  method’ 
will  be  motivated  by  demonstrating  its  value  as  a  deterministic 
approximation  scheme.  In  the  current  section  the  ARMA  approxi¬ 
mator  f  (•)  is  defined  and  shown  to  be  related  to  the  e  -trans- 

p»q  n 

form.  In  Section  2. 3  truncated  Fourier  series,  autoregressive, 
and  ABMA  approximators  will  be  compared  as  to  their  ability  to 
approximate  a  function  f ( • ) .  Comparisons  will  be  made  on  the 
basis  of  how  well  the  approximator  fits  f (•)  visually,  and  also 
by  means  of  the  measure 

ISE(f*)  -  /ff(f*(x)  -  f(x))2dx. 
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where  £*(•)  approximates  f(*). 

Gives  the  Fourier  coefficients  $(0),  $(1) . $ (m)  (note 

♦  (-v)  *  $(v))  of  a  function  f(»)  with  a  series  representation  as 
in  the  previous  section,  the  most  obvious  choice  for  an  approxi¬ 
mator  of  f(x)  Is 


fm(x)  ’  TT  1  ^v>elVX 

v*-m 


The  error  associated  with  this  approximation  is 


|f(x)-fffl(x)i  *  2“  Is  $(v)eivx|  , 
I  v  I  >m 


which  can  be  made  arbitrarily  small  by  choosing  m  large  enough. 

The  convergence  of  f  (•)  to  f (•)  is  uniform  if  f (•)  is  continuous 

n 

and  of  bounded  variation  (see  Apostol  (1973)).  In  addition  to 

the  polntwlse  error  of  f  (•),  we  have,  by  Parseval's  theorem, 

ni 

ISE(f  )  *  U(v)|2  , 

B  Vmfl 


In  certain  applications  or  for  certain  functions,  a  suitable 

choice  for  m  may  be  prohibitively  large.  In  other  words,  f  (♦) 

m0 

based  upon  a  reasonable  number  m^  of  Fourier  coefficients  may  not 

provide  an  adequate  approximation  to  f(*).  Suppose,  however,  that 

$(mQ+l),  <j) (mg+2) , . . .  are  in  some  sense  related  to  the  previous 

Fourier  coefficients.  It  may  then  be  possible  to  exploit  this 

relationship  and  construct  an  approximator  based  on  $(0) ,$(1) , . • . , 

$(m„)  which  has  better  error  properties  than  does  f  (*)• 

0  mQ 


A  model  for  the  relationship  between  the  Fourier  coefficients 
of  f(*)  which  is  often  at  least  approximately  satisfied  is 

($(v)}  e  L(p,A)  for  v  >  q  .  (2.1) 


where  (f  }  e  L(n,A)  for  m  >  mrt  if  there  exists  a  smallest  integer 

tu  U 

n  >  0  and  a  set  of  c^'s  such  that 
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^  Ci  f  .  "b  •  •  •  +  c  f  ■  0,  n  >  ni_  • 
m  1  m-1  n  m-n  *  0 

In  the  following  theorem  we  establish  the  equivalence  of 
functions  whose  Fourier  coefficients  satisfy  (2.1)  and 
functions  having  ASMA  representations. 

Theorem  2 . 1  Suppose  the  roots  of 


1  -  a,x-. . .-a  xp  «  0 
1  P 


all  lie  outside  the  unit  circle.  Then  f(>)  has  an  ASHA  repre¬ 
sentation  of  the  form 


Z  Be* 

v— q _ 


a.e.  t-ir,ir] 


iff  $(v)  -  <^$(^1)  -  ...  -  a  <Kv-p)  -  0,  v  >  q 


Proof:  Suppose  first  that  $(v)  satisfies  the  prescribed 
difference  equation.  Now  consider  the  function  f*  (•) 

p»q 

satisfying 


p.q 


1  &v« 
v-q 

1 1  lx  lpx  1 2 

U-o^e  -  ...  -  ape  *  | 


for  x  e  E-ff,  x] 


where  the  8  are  chosen  so  that 
v 


/"e1^* f*  (x)dx  -  (j)  -  <Kj)  for  |j|  ■  0,1, ...,q. 

p  *  q  p » q 


The  systsn  of  2q+l  equations  which  must  be  solved  to  find  the 
8y  is  readily  seen  to  be  linear,  and  it  is  tacitly  assumed  that 
the  system  has  a  solution. 


# 
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Now  consider,  for  v  >  q. 


4>*  „(v)  -  a  (v-1)  - 

p.q  1  p.q 


-  a  (v-p) 

p  p*q 


/*,  -ivx  „  -i(v-l)x  -i(v-p)x 

j  (e  -a.e  -...-a  e  )f*  (x)dx 

-it  1  p  P* 


»  /Tre_ivx(l-a1e^X-...-a  eipx)f*  (x)dx 
-Tf  1  P  P.q 


/ 

-7T 


e-ivxeiqx5  ^-ixCq-v)^ 

_ v~-q  v _ 


,,  —  -ix  —  -ipx, 

(1-a.e  -  ...  -  a  e  *  ) 

1  P 


- 


zv-q-l  £  8vzq  V  dz 

_ y-q _ 

(1-a, z  -  ...  -  a  zp) 
1  P 


Since  v  >  q  and  the  roots  of  1  -a1z-...-a  zp  -  0  are  outside  the 

unit  circle,  it  follows  that  the  above  integrand  is  analytic 

on  and  Inside  the  unit  circle.  Thus,  by  the  Cauchy-Goursat 

theorem  the  Integral  is  zero.  It  follows  that  (v)  satisfies 

P.q 

the  same  difference  equation  as  $(v)  for  v  >  q.  Since  <Kv)  - 
<fr*  (v)  for  v  ■  0 ,1, 2, . . . ,q  we  must  then  have  $(v)  ■  <fr*  (v)  for 

p.q  p.q 

|v|  -  0,1,2,...  .  By  the  uniqueness  of  the  Fourier  coefficients 
of  square  lntegrable  functions  (and  it  is  easily  shown  that  a 
function  having  an  ABMA  representation  is  square  lntegrable) ,  it 
follows  that 

f(x)  ■  f*  (x)  a.e.  [-ir,ir]  . 

p.q 


One  part  of  the  theorem  is  thus  proven.  By  mimicking  a  portion 
of  the  above  argument  it  is  easily  shown  that 

(KvJ-a^v-l)  -  ...  -  ap$(v-p)  •  0  ,  v  >  q  , 

whenever  f ( •)  has  the  stated  ARMA  representation. 

Implicit  in  Theorem  2.1  is  a  method  for  forming  an  approxi¬ 
mator  of  f(*)  in  the  situation  where 

$(v)  -  a, $(v-l)-...-a  $(v-p)  *  0,  v  >  q. 

1  P 

Given  <K0),  <KD,  $(-l) . <K-p-q)  »  $(p+q)  an  approximator 

f*  (•)  can  be  constructed  by  first  solving  the  system  of 

p.q 

equations 

OjMq)  +  a2<Kq-l)+. .  .-hipiKq-p+1)  ■  $(q+l) 
a,$(q+l)+a2$(q)  +. .  ,+a  <Kq-p+2)  ■  <Kq+2)  (2.2) 

e 

a1*(q+P-l)+o2<>^+p''2^+',,+ap<J^q^  “ 
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♦*  „(v)  ■  $(v),  |v|  -  0,1, . . .  ,p+q  . 
P»H 


(2.3) 


This  property  follows  from  the  fact  that  by  (2.2)  and  Theorem  2.1, 

4(v)  and  (v)  both  satisfy  the  difference  equation 
P»<1 

y(v)  -  a^yCv-l)-.  ..-ayr(v-p)  •  0 

for  v  ■  q+l,...,q+p  subject  to  the  Initial  conditions  y(v)  ■  <J(v), 
|v|  «  0,1, ...,q. 

Property  (2.3)  justifies  the  use  of  the  term  approximator 

for  £*  (•)  even  when  $(•)  is  not  well  modeled  as  the  solution  to 

P»<1 

a  &  Terence  equation.  The  following  error  properties  of  f*  (•) 

P»4 

are  a  simple  consequence  of  (2.3). 


|f(x)  -  f*  (x)|  -  4-|  T(<Kv)  -  **  (v))elvx| 


i  v | >p+q 


(2.4) 


.  *  , 
ISE(f*  )  -  *  I  U(v)~<>*  (v)| 


v-p+q+1 


Although  the  method  discussed  above  for  constructing  f*  (•) 

p»q 

is  informative,  it  can  be  quite  cumbersome  analytically .  The  approxi¬ 
mator  f  (•)  to  be  defined  below  will  be  shown  to  be  identical  to 
P»H 

f*  (»)  under  the  assumption  that  the  roots  of  1  -  a.x  -  ...  -  a  xp  ■  0 
P.q  1  P 

lie  outside  the  unit  circle.  However,  f  (*)  has  the  advantage  of 

P»<1 

being  much  simpler  to  construct  than  f*  (•)•  In  addition,  the 

PtH 

dependence  of  f  (•)  upon  a  numerical  analysis  tool  known  as  the 
P*3 

e  -transform  provides  important  insight  into  why  the  ASHA  method 
n 

is  of  value  as  an  approximation  scheme. 

Before  defining  f  (•)  we  give  the  following  definition  of 
P»3 


Che  e  -transform, 
a 

Definition  2.1  Given  the  sequence  {a^,a^+^, . . . , }  of  complex 

numbers  and  the  partial  sums  A.  •  J  a  ,  we  define  (for  m>n+k-l) 

1  ,  v  — 

J  v*k 

A  A  , 

m-n  m-n+1 

am-n+l  am-n+2 

e  • 

•  • 

•  • 

am  am+l 

1  1 

am-n+l  am-n+2 

•  e 

•  e 

e  e 

am  am+l 

whenever  this  quantity  is  defined.  If  both  numerator  and 

denominator  are  zero,  then  define  e  (A  )  ■  e  (a  ) .  If  only 

n  m  o-l  m  J 

the  denominator  is  zero,  then  e  (A  )  *  ». 

□  m 

The  Important  result  associated  with  the  eQ-transform 
is  that  in  a  wide  class  of  problems  e  (A  )  is  a  better  approxi- 
matlon  to  A  than  is  A_ With  Definition  2.1  we  are  now  in 

00  Drill 

a  position  to  define  the  approximator  f  (•)• 

PiH 

Definition  2 . 2  let  ($(k),  $(k+l),...>  be  a  sequence  of  Fourier 

coefficients  of  f(*).  Then  the  approximator  f  (•)  of  f(*)  is 

PtH 


1 


defined  as 


f  _  „(x)  -  ±-[*(0)+2Real{e  (P  (x))-Fn(x) }] ,  x  e  [-ir.ir], 
p*q  zn  p  q  U 


where 


iq-p  +  1,  q  +  l-  p<0 
1.  q  +  l-  p>0 


FjCx) 


l  *(v)eivx  ,  j  >  k 
iv-k 


.  1  <  k 


Since 

00  OB 

f(x)  Z  $(v)eivx  -  —[♦(0)+2Real(  Z  *(v)elvx)]  , 

v®-®  v-1 

00 

e  (F  (x))  -  F_(x)  is  seen  to  approximate  Z  $(v)eivx.  The 
P  q  °  v-1 

extent  to  which  e^(F^(x))  -  Fg(x)  is  a  better  approximator  of 

this  quantity  than  P^|(v)e^VX  depends  upon  the  particular 

V"1 

sequence  {^(v) } .  Conditions  under  which  e  (A  )  converges  more 

_  j _  n  m 

«  n+m 

rapidly  (as  m  -*>  «)  to  Z  a .  than  does  Z  a  have  been  estab- 

v®k  v  v-k  v 

lished  by  different  authors,  including  Shanks  (1955),  McWilliams 
(1969) ,  and  Gray,  Houston,  and  Morgan  (1978) .  Except  for  Theorem 
2.2,  however,  the  discussion  of  these  conditions  will  be  postponed 
until  Chapter  V7.  For  the  present,  we  simply  note  that  they  pro¬ 
vide  an  important  motivation  for  using  the  eQ-transform  in  situa¬ 
tions  where  {a  }  is  not  the  solution  of  a  difference  equation. 

The  strongest  result  concerning  the  e^- transform  is  the 
following. 


Theorem  2.2  Suppose  the  complex  sequence  {a  }  Is  an  element  of 

m 

L(n,A)  for  m  >  m^  and  that  the  roots  of  the  associated  character¬ 
istic  equation  are  outside  the  unit  circle.  Then 


e  (A  )  ■  la  for  all  m  >  mn  . 

n  m  .  v  —  u 

v-k 


Proof:  See  Gray,  Houston,  and  Morgan  (1978)  for  the  case  of 
{a^}  real.  The  extension  of  the  proof  to  include  {a^}  complex 
is  trivial. 

By  applying  the  results  of  Theorems  2.1  and 2.2  ,  the  equi¬ 
valence  of  f*  («)  and  f  (•)  is  easily  shown.  Morton  (1981) 

p.q  p.q 

has  also  proven  this  result  in  the  context  of  power  spectral 
density  estimation. 

Theorem  2.3  Let  f  _(*),  f*  (•),  and  o, ,aA,...,a  be  as  defined 

-  P.q  P.q  12  p 

previously,  and  suppose  that  the  roots  of  1-a^x-. . ,-opxp  •  0  are 
outside  the  unit  circle.  Then  we  have 

f*  „<*>  5  fn  o(,)  * 

p»q  p.q 


Proof:  Let  k  be  as  in  Definition  2.2  and  consider 


Z  (v)e  ,  where  (v) 


is  the  vth  Fourier  coefficient  of  f*  (•)•  Since  ($*  (v)} 

p.q  p.q 

satisfies  a  pth  order  difference  equation  for  v  >  q,  then  so  does 

{♦*  (v)e  }.  Therefore,  by  Theorem  2.2,  we  have 
P.9  m 

«<*?<*»  -  *  ♦*  . 


This  implies  that 

-=-[**  (0)  +  2Real {  e  (F*(x))  -  F  *(x) }) 

27T  p,q  p  q  0 

00 

-  -s—t**  (0)  +  2Real(  Z  (v)eivx)]  -  f*  (x)  . 

2lT  p.q  v_i  p.q  p.q 

However,  since  (v)  -  $(v)  for  |v|  -  0,1,..., p+q  it  follows 
P»4 

that 

&♦$,,«» +  «••!<  *p<F;<*»  -  Fs<>>)) 

-  ^(♦(O)  +  2R«al!  •p(rq(x))-F0(»)  }]  -  fp>qOO. 

Thus,  f  (x)  *•  f*  (x) . 

p.q  p.q 

Since  f*  (•)  and  f  (•)  are  equivalent  under  the  condition 

p.q  p.q 

(which  shall  henceforth  be  referred  to  as  condition  S)  that  the 
roots  of  1  -  <XjX  -  ...  -  a^xp  •  0  lie  outside  the  unit  circle,  it 
follows  that  f  (•)  satisfies  the  error  properties  of  (2. A)  under 

p.q 

condition  S.  However,  the  following  two  important  facts  are  noted 
at  this  time. 

(a)  If  condition  S  is  not  satisfied,  then  f*  (•)  and 

p.q 

f  (*}  are  not  in  general  equivalent. 

p.q 

(b)  If  condition  S  is  not  satisfied,  then  neither  f*  (•) 

p.q 

nor  f  (•)  possess  the  property  that  their  first 

p.q 

p  +  q  +  1  Fourier  coefficients  are  equal  to  $(0), 

+(1),...,  $(p+q). 

Because  of  fact  (b),  it  is  not  clear  in  what  sense,  f  (•)  is 

p.q 
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approximating  f(*>  when  condition  S  is  not  satisfied.  When  using 

fp  q(*)  for  approximation  purposes  it  is  thus  important  to  always 

verify  whether  or  not  this  condition  is  met. 

In  concluding  this  section  two  special  cases  of  f  (*) 

P,<i 

are  noted.  When  p  ■  0 

fn  „(x)  -  ^-[*(0)  +  2Real (  Z  *(v)eiV*)]  , 

°'q  2*  v-1 

a  Fourier  series  approximator,  and  when  q  «  0 

1 


fp.0W 


k 

_£ 

2ir 


1 1  „  ix  ipx 1 2 

1  -«,e  -. . .-a  e  r 

1  P 


an  autoregressive  approximator.  The  first  of  these  two  relationships 

follows  trivially  from  the  definition  of  e  (A  ) .  The  second  follows 

n  m 

from  the  fact,  proven  by  Pagano  (1973),  that  condition  S  is  always 

satisfied  whenever  q  ■  0  (assuming  {$(v)}  is  positive  definite), 

and  thus,  by  Theorem  2,3,  f  A(*)  -  f*  «(*)•  Autoregressive  approxi- 

p,u  P»u 

raators  have  an  advantage  over  AKMA  approximators  in  that  they  always 

satisfy  condition  S,  which  of  course  implies  that  4>  A(v)  -  $(v)  for 

p,u 

| v |  »  0,1,... ,p.  However,  as  will  be  illustrated  in  the  next  and 
succeeding  sections,  there  is  much  to  be  gained  in  considering 
f  (•)  for  q  >  0. 


2.3  Examples  Comparing  Fourier  Series,  Autoregressive,  and 
ARMA  Approximators 

By  way  of  illustration  we  will  now  compare  the  Fourier 
series,  autoregressive,  and  ARMA  (p  >  0  and  q  >  0)  methods  of 
approximating  a  function.  Since  these  methods  are  of  interest 
to  us  in  the  context  of  density  estimation,  the  examples  to  follow 
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involve,  for  Che  most  part,  functions  which  are  commonly  used 
as  models  for  probability  densities.  Although  they  are  cer¬ 
tainly  not  exhaustive,  Che  examples  given  serve  to  illustrate 
the  value  of  the  ABMA  method  as  an  approximation  scheme. 

Numerous  additional  examples  already  exist  which  show  drama¬ 
tically  how  the  e^-transforra  accelerates  the  rate  of  conver¬ 
gence  of  slowly  convergent  sequences ,  and  in  some  cases  induces 
convergence  of  divergent  sequences  (see  Gray,  Houston,  and 

Morgan  (1978)).  Since  the  sequences  (F  (x)}  associated  with 

tn 

the  functions  of  this  section  are  not  what  would  usually  be 
considered  slowly  convergent,  the  examples  which  follow  are 
not  as  dramatic  as  those  just  mentioned,  but  nonetheless 
interesting. 

In  our  first  example,  we  investigate  how  well  the  Fourier 
series  and  autoregressive  methods  fare  in  approximating  a  density 
for  which  there  exists  an  error-free  AKMA  approximator  Con¬ 
sider  the  function 

f(1)(x)  -  +  7f2^  ^ x  e 


a  mixture  of  the  densities 


i(ir/4))eix|2 


and 


:x)  -  U-*7*2IU--- 

2  (  |l-(.80e 

,(*)  -  (— C4n»)  t  . 

2 \  |l  +  (.85i)eix|2  .) 


.  50el3C|2ll-(.40e1(T/8))elx|2 


A  result  which  will  be  proven  in  Chapter  III  is  that  the  mixture 
of  densities  having  ABMA  representations  Itself  has  an  AKMA 
representation.  With  this  result  it  is  easily  verified  that 


f^(*)  has  an  ABMA  (2,3)  representation.  By  the  earlier  results 
of  this  chapter,  it  then  follows  that  the  approximator  is 

identical  to  f^\*)»  or,  in  other  words,  f^(0  is  completely 
determined  by  its  first  five  Fourier  coefficients.  Of  interest, 
though,  is  a  determination  of  how  well  the  Fourier  series  and 
autoregressive  approximation  schemes  perform  in  this  situation. 

In  Figures  2.1  and  2.2,  respectively,  the  Fourier  series 
and  autoregressive  approximators  based  on  (1)  , . . .  (5) 

have  been  plotted  with  f^(»).  Figure  2.3  shows  a  plot  of  ^q^qC* 
and  f^(*)»  and  in  Table  2.1a  comparison  of  ISE  is  given  for 
the  two  methods  being  considered.  The  ISE  for  each  approximator 
has  been  approximated  numerically  by  Simpson's  rule  using  201 
function  evaluations  on  [-ir,ir].  (The  ISE  in  all  the  examples  to 
follow  has  been  calculated  in  the  same  way.)  The  autoregressive 
method  Is  seen  to  perform  considerably  better  in  this  instance 
than  does  the  Fourier  series  method.  In  a  visual  sense  fg1^*). 
f^g(»),...,fjj^Q(»)  are  virtually  indistinguisable  from  f^(») 
(and  hence  a  plot  of  ^as  ^een  fitted).  The  Fourier 

series  approximators,  however,  have  difficulty  in  resolving  the 
peaks  of  f^(*)  without  introducing  spurious  variation.  This 
shortcoming  is  even  more  important  in  the  stochastic  setting 
where  it  is  desirable  to  limit  the  cause  of  spurious  variation 
in  a  fitted  curve  to  sampling  variability.  In  Chapter  VI  a 
data  set  is  discussed  which  verifies  the  practical  importance 
of  densities  such  as  f^(*)  which  have  rather  sharp  peaks. 

In  our  last  three  examples  we  compare  the  three  different 
approximation  schemes  on  functions  which  do  not  have  ABMA  repre- 


TABLE  2.1 


ISE  COMPARISON  FOR  FOURIER  SERIES  AND  AUTOREGRESSIVE 
APPROXIMATORS  OF  THE  FUNCTION  f(1)(*) 


k 

ISE  «<«> 

ise  <f<»: 

5 

.04216 

.00238 

6 

.03332 

.00055 

7 

.02574 

.00025 

8 

.01495 

.00008 

9 

.01307 

.00003 

10 

.00865 

.00001 

11 

.00577 

.00000 

12 

.00487 

.00000 

13 

.00299 

.00000 

14 

.00230 

.00000 

15 

.00174 

.00000 

1 
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sentations.  In  this  way  the  versatility  of  the  ARMA  method  is 
investigated  by  examining  its  performance  in  situations  which 
are  other  than  ideal  for  it.  The  functions  considered  are 


f(2)(x) 


1224.yHr.15 
ir  l2ir  J 


[1- 


f(3)(x)  -  2.'4Wlr  ,Cx>, 

L-ir,irj 

and 

f(4)(x)  -  6[~]5exp{-2[(xHr)/2]6}  . 

(2) 

The  function  f  (.)  is  simply  a  Beta  (16,3)  density  which  has 

been  shifted  and  rescaled  so  that  its  support  is  the  interval 

(3) 

[~ir »ir  1  -  The  second  function,  f  (•),  is  a  truncated  double 

(4) 

exponential  (or  Laplace)  density,  and  f  (.)  is  a  Weibull 

density  (with  scale  parameter  2  and  shape  parameter  6)  which 

has  been  truncated  at  ir  and  then  shifted  and  rescaled  to  have 

support  [-ir,ir].  Since  f^(»)  and  f^(«)  exclude,  respectively, 
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only  .00035%  and  less  than  10  %  of  the  area  of  the  original 

densities,  the  comparisons  to  follow  may  be  regarded  as  compari¬ 
sons  of  the  ARMA,  Fourier  series,  and  autoregressive  density 
estimation  methods  in  the  absence  of  stochastic  errors. 

Pictured  in  Figures  2.4  -  2.12  are  plots  of  various 
approximators  along  with  the  functions  f^(«)>  i  *  2,3,4. 
Comparisons  of  ISE  are  given  in  Tables  2.2  -  2.4.  Both  visually 
and  in  terms  of  ISE,  the  ARMA  approximators  display  a  decided 
advantage  over  the  other  two  approximation  schemes.  A  hallmark 
of  the  ARMA  method  which  surfaces  in  these  three  examples  is 
the  ability  of  ARMA  approximators  to  correctly  fit  both  the 


TABLE  2.2 

ISE  COMPARISON  FOR  APPROXIMATORS  OF  THE  FUNCTION  f 


Function  fv  7  (•)  and  Fourier  Series  Approximator 


4 
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TABLE  2.3 

ISE  COMPARISON  FOR  APPROXIMATORS  OF  THE  FUNCTION  f(3) (•) 


k 

ise(£S-i> 

ISE<fS: 

1 

.55890 

.84191 

.84191 

2 

.35518 

.02652 

.31226 

3 

.22480 

.00403 

.10221 

4 

.14523 

.00602 

.08036 

5 

.09675 

.00613 

.01784 

6 

.06662 

.00517 

.02481 

7 

.04733 

.00410 

.00407 

8 

.03460 

.00319 

.00969 

9 

.02595 

.00249 

.00151 

10 

.01989 

.00196 

.00472 

11 

.01555 

.00157 

.00095 

12 

.01237 

.00127 

.00274 

13 

.00999 

.00105 

.00077 

14 

.00818 

.00088 

.00180 

15 

.00678 

.00075 

.00067 

.257 
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TABLE  2.4 

ISE  COMPARISONS  FOR  APPROXIMATORS  OF  THE  FUNCTION 


k 

lsE(fM 

I 

.44243 

1.84975 

1.84975 

2 

.23112 

.20863 

1.22354 

1.22354 

3 

.10407 

.03968 

.11457 

.92122 

4 

.04104 

.00804 

.01615 

.87385 

5 

.01463 

.00155 

.00228 

.68997 

6 

.00488 

.00028 

.00030 

.66224 

7 

.00155 

.00005 

.00003 

.28571 

8 

.00047 

.00001 

.00000 

1.27134 

9 

.00013 

.00000 

.00000 

.54683 

10 

.00004 

.00000 

.00000 

.90297 

tails  and  the  peak  of  a  function.  In  Figures  2.4,  2.7,  and  2.10 
the  Fourier  series  approximators  are  seen  to  correctly  (or  nearly 
correctly)  fit  the  peak  of  each  function  only  at  the  expense  of 
Incorrectly  fitting  the  tails.  By  contrast,  the  ABMA  approxi¬ 
mators  of  Figures  2.5,  2.8  and  2.11  (based  in  each  case  on  the 
same  number  of  Fourier  coefficients  as  the  corresponding  Fourier 
series  approximator)  smooth  out  variation  in  the  tails  while 
still  correctly  fitting  the  peaks. 

The  autoregressive  method  performs  quite  well  on  the 
function  f^(»)  but  does  very  poorly  on  f^(»)  and  f^(«). 

This  phenomenon  can  be  explained  quite  simply  by  examining  the 
Fourier  series  representation  of  the  approximator  f^  q(0*  By 
property  (2.4)  we  have 

lfC*)-fk  0<*>l  "  I  Z  C*(vWk  0(v))elvx| 

|v|>k 

and 

ISE(fk  o5  ’  7  1  l*(v)  “  *k  o(v)l2  * 

K,u  Vk+l 

The  approximator  f^  q( •)  obviously,  then,  performs  poorly  if  it 
does  a  poor  job  of  extrapolating  the  Fourier  coefficients  <Kk+l) , 

$(k+2) .  This  is  clearly  what  has  occurred  in  the  examples 

Involving  f^(»)  and  f^(»).  Our  examples  seem  to  Indicate 
that,  in  general,  fixing  the  autoregressive  order  and  allowing 
the  moving  average  order  to  increase  is  the  best  scheme  for 
reducing  the  error  Inherent  in  Fourier  series  approximators. 


Having  examined  the  advantages  of  using  the  AHMA  (as 
opposed  to  Fourier  series  or  autoregressive)  approximation 
method  the  remainder  of  this  work  is  devoted  to  an  investi¬ 
gation  of  the  ARMA  method  in  the  stochastic  setting  of  prob- 

» 

ability  density  estimation. 


CHAPTER  III 


SMALL  SAMPLE  PROPERTIES  OF  ARMA  DENSITY  ESTIMATORS 

3.1  Introduction 

We  now  formally  begin  our  study  of  probability  density 
estimation  via  ARMA  representations.  In  the  current  chapter 
we  Introduce  the  estimation  problem  and  define  an  ARMA  estimator 

A  A 

f  (*).  Alternative  ways  of  expressing  f  (*)  are  derived 
P*^  P»Q 

which  serve  to  motivate  ARMA  estimators  and  show  explicitly 
their  relationship  to  Fourier  series  estimators.  The  main  result 
of  this  chapter,  however,  will  be  establishing  the  relationship 

A 

between  f  (•)  and  the  generalized  jackknife  statistic.  It  will 

Pi4! 

be  shown  that  ARMA  estimators  employ  an  adaptive,  higher  order 
generalized  jackknife  scheme. 

Chapter  III  is  concluded  with  a  result  concerning  the  mixture 
of  densities  having  ARMA  representations.  The  mixture  of  auto¬ 
regressive  densities  is  seen,  in  general,  to  be  an  ARMA  density. 

This  result  conveys  the  necessity  of  ARMA  representations  to  a 
theory  based  on  the  representation  of  densities  by  autoregressive 
schemes . 

3.2  Definition  of  the  Estimation  Problem  and  f  (*) 
_  _ Ea9 _ 

Suppose  T  is  a  random  variable  with  continuous  probability 

density  function  g(*)  and  that  a  random  sample  Y^,...,Yq  is  obtained 

from  g(*)>  la  Che  remainder  of  this  work  we  shall  be  concerned  with 
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the  problem  of  estimating  the  function  g(»). 

All  theoretical  results  will  be  based  upon  the  assumption 
that  Y  has  the  finite  support  [a,b].  To  be  consistent  with  pre¬ 
vious  notation,  ve  shall  in  this  situation  consider  estimating  the 
density  f(*)  of  the  random  variable 


which  has  support  [-*,*].  As  before  it  is  also  assumed  that  f(*) 
has  the  Fourier  series  representation 


f(x) 


I  *(v)eivx  , 

V— -® 


a.e.  . 


Tapia  and  Thompson  (1978)  note  that  the  finite  support 
assumption  is  only  a  small  liability  in  practice  since,  in  the 
absence  of  any  prior  information  about  g(*>,  it  would  be  unrea¬ 
sonable  to  estimate  the  density  outside  the  range  of  the  data. 
If  the  support  of  Y  is  indeed  infinite,  or  unknown,  then  a  and 
b  may  be  replaced,  for  a  given  data  set,  by  y ^  and 
where 


(i)  y^Q^  and  are  "natural"  minimum  and  maximum 

values  for  the  random  variable  Y,  or 

(il)  y(0)  "  y(l)  40(1  y(n+l)  “  y(n)  (y(i)  dea°tes  the 

1th  order  statistic  of  the  random  sample  y^,...,yQ). 


The  density  g{*)  is  then  estimated  over  the  interval  [y^  .y(nn)] 
by  first  estimating  an  associated  f(*)  over  the  interval  [-it ,rr] 
using  the  transformed  sample 
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Under  the  finite  support  assumption  f(*)  is  characterized 
by  the  Fourier  coefficients 

$(v)  -  /  e"1VXdF(x),  v  -  1,2 . 

-IT 

where  F(*)  is  the  cumulative  distribution  function  (cdf)  of  X. 
Given  a  random  sample  X^,...,Xq  from  f(*)  we  shall  estimate  $(v) 
by  forming  an  appropriate  functional  of  the  empirical  cdf  Fn(*)> 
l.e. 

<Kv)  -  /  e-ivxdF  (x) 

n 

-IT 


■  i  i  •“"j. 

3-1 


v  ■  1,2, . . . 


The  empirical  characteristic  function  $ (v)  is  obviously  unbiased 
for  <p(v )  and  also  possesses  the  following  easily  established 
properties  (see  Tarter  and  Kronmal  (1970)): 


var(«fr(v)) 

A  A 

eov(<Kv1),  $(v2)) 


i(l  -  U(v)|2) 

- -  -  (3.1) 

E(^(v1)«J(v2))  -  <Kv1)<)>(v2) 

^t<Kv1-v2)-<*(v1)iK-v2)],  vx  +  v2  . 


For  the  situation  where  the  support  of  the  original  random 

variable  Y  is  infinite  or  unknown  we  have 

2,  x  l  5  -ivx, 

♦(v)  } 


'  «  3-1  “P<  <TCo+l)_Y(0)>  t2T3'<T(n+l)tT(0),1)- 
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In  chis  case  <J>(v)  Is  unbiased  for  the  parameter 


4>*(v)  -  E(e"ivX> 


/  e“iv*f(x)  dx  , 


where  f(*)  is  the  density  of  the  random  variable 


*■  <ww I2?1  -  °ww 


The  density  being  estimated  on  [— it , ir 3  by  the  methods  to  be  dis¬ 
cussed  below  is  thus 


f<x)  -  Z  $*(v)eivx 

v»— <0 


(where  it  is  assumed  that  this  Fourier  series  converges) .  We 

note  that  if  Y^  and  Y  (n+1)  are  nonstochastic  the  properties  in 

(3.1)  hold  if  $(v)  is  replaced  by  $*(v) . 

We  are  now  ready  to  define  the  ARMA  estimator  f  (•)  of 

P.q 

f(*),  where  it  is  understood  that  f(*)  arises  in  one  of  the  two 
ways  described  above. 

A  A 

Definition  3.1  Let  {$00*  $(k+l),...}  be  a  sequence  of  estimated 
Fourier  coefficients.  Then  the  ARMA  estimator  f  (*)off(*)ls 
defined  by 

f_  „(x)  * -57"tl  +  2Real{e  (F  (x))  -  Fn(x)}],  x  e  [-jt.it] 
p,q  Zir  p  q  u 


where 


Jq  +  i-P,  q  +  i-  P< 
l  1,  q  +  1  -  p  > 
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and 


$(v)eivx  ,  j  >  k 


i  v*k 


Fj(x) 


0  ,  j  <  k 


It  Is  seen  that  f  (•)  is  simply  the  stochastic  analog  of  the 
P»9 

approximator  f  (*).  Just  as  in  the  deterministic  setting  we 
P  »9 

have  the  two  special  cases 

1  * 
rn 

0,q  < 

and 


f0  o(x)  "  lf[1  +  2Real(  Z  *<v>eiVX>l 

v«l 


2ir  ll-a.e^31-  ...  -  a  e*px|2 
1  1  P  1 


a  Fourier  series  and  autoregressive  estimator  respectively. 


3.3  The  Generalized  Jackknife  Property  of  f  (•) 

_ 1 _ RiSL . - 

Schucany,  Gray  and  Owen  (1971)  introduced  a  generalized 
notion  of  the  jackknife  statistic  which  greatly  enhances  the 
effectiveness  of  the  jackknife  as  a  bias  reduction  tool.  Their 
work  exploits  the  specific  form  of  the  bias  expansion  of  an  estima¬ 
tor  and  gives  the  proper  notion  for  reapplication  of  the  jackknife. 

Following  Gray  and  Schucany  (1972)  the  generalized  jackknife 
may  be  defined  as  follows. 

A  A  A 

Definition  3.2  Let  ®2*  *  *  * ’®k+l  ^  +  ^  estimate for  9 

based  on  the  random  sample  Further,  let  a^ ,  i-l,...,k 

and  j  “  l,...,k+l,  be  real  numbers  satisfying 
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1 

1 

1 

*11 

« 

*12  '*• 

0 

*l,k+l 

• 

• 

V 

0 

0 

*k2  *** 

*k,k+l 

(3.2) 


Then  the  generalized  jackknife  G(9^, 


,6^-^)  is  defined  by 


G(01,02,...,0k+1) 
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©2  *  •  • 

9k+l  ! 

*11 

« 

*12* ** 

• 

*1,W-1 

• 

• 

• 

*kl 

• 

• 

*k2* ’ * 

• 

• 

*k,k+l 

1 

1  ... 

1 

*11 

• 

*12* " 

• 

*l,k+l 

• 

• 

*kl 

• 

*k2* 

• 

*k,k+l 

A  simple  form  for  the  bias  of  the  generalized  jackknife  Is  obtained 
in.  the  following  theorem. 

Theorem  3.1  If 

* 

EC®.)  -  8  •  2  h. . (n)b. (0) ,  j  -  1,2, . . . ,k+l 
J  i«l  1 

and  (S.2)  is  satisfied  with  a^  -  h^(n),  then 

A  A  A 

EfG(9^f  0£»  •  •  •  ’®jj+l^  *  9  +  Bj,(n,9)  , 


where 


hu(n)  h12(n)  ...  h1Jc+1(n) 

•  •  » 

•  •  « 

•  *  « 

*Vi(n>  '•*  Vjc+i(n) 

BG(n,0)  * - - - - 

1  1  ...  1 

hll(n)  h12(n)  '**  hl,lc+l(n) 

•  •  • 

•  • 

•  • 

\l(n)  W">  ••• 

and 

00 

Bj  "  E  h  (n)b.C0),  j  -  1,2,... ,k+l . 
i»k+l  J 

Proof :  See  Gray  and  Schucany  (1972) . 


An  Immediate  corollary  to  Theorem  3.11s  ft  it  0(^,0,,, . . .  ,e  )’ 

is  unbiased  for  0  if 

-  k 

EOj)  -  0  hy  (n)b^(0) ,  j  .  1,2 . k+1. 

In  order  to  see  the  sense  in  which  f  (•)  is  related  to 

P»  Q 

the  generalized  jackknife  recall  that 

•  h [1  +  ‘“'s'V'11  -  v*’11 


where 
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s 


e(F  (x)> 

p  q 


VpW 


Vph'*' 


••  y>o 


<»-■>«>* ...  J(, +1) .«■>*»* 


<Kq)e 


iqx 


*(q+l)e 


i(q+l)x 


$(q+p)e 


i(q+p)x 


1  1  ...  1 


iKq)e 


iqx 


♦(,+i)«1(’+1,x  ...  ;(,+p).«'rt')» 


A  A  A 

Now,  if  G(F^_p(x),  Fq.p+j^C*)*..*.?  (x))  is  the  statistic  obtained  by 

a 

replacing  $(j)  in  the  above  determinants  by  fixed,  known  quantities, 
then  G  is  a  generalized  jackknife  statistic.  More  importantly,  we 

A  A 

note  that  ep(F^(x))  and  each  of  Fj (x) ,  j  »  q-p, . . . ,q,  are  estimators 
of  m 

l  *(v)eivx 
v»k 

(where  it  is  assumed  that  the  support  of  Y  is  finite  and  known), 

A 

and  that  (x)  has  the  bias  expansion 

E[F  <x)]  -  !  *(v)eivx  -  -  ?  *(v)elvx  -  -1  ^(v+j)e1(v+J)x. 
J  v-k  v-j+1  v-1 


In  the  notation  of  Theorem  3.1 »  and  allowing  h  ,(n)  to  depend  on 

fflj 

unknown  parameters,  ve  then  have 

♦  .  h  (n) , 

mj 

J  •  1,...,  p  +  1,  m  -  1,2,...  and  b  (9)  3  -1. 

m 


51 


The  pth  order  e  -transform  of  F  (x) ,  e  (F  (x)),  is  thus  seen  to 
n  q  P  q 

be  an  adaptive,  generalized  jackknife  statistic  in  the  sense 

that  it  employs  estimates  of  the  unknown  terms  h  .  (n)  in  the 

bias  expansion  of  F_  i.  .,(*)•  In  other  words,  e  (F  (x))  has 
q-p-i+j  p  q 

the  same  form  as  a  generalized  jackknife,  but  adapts  itself  to 

a  particular  data  set  by  estimating  the  unknown  quantities  h  (n) . 

mj 

Of  Interest  now  is  an  expression  for  the  bias  of  e  (F  (x)). 

P  q 

We  have 


Bias[e  (F  (x))]  -  E[e  (F  (x>)]  -  E  *(v)e 

p  q  p  q  v-k 


ivx 


-  E[e  IF  (x))  -  ?  *(v)elvx]  . 


P  q 


v-k 


An  easily  proven  property  of  the  e  -transform  is 


e  (A  +  c)  -  a  (A  )  +  c  , 
no  no 


and  thus 

Bias[e  (F  (x))]  -  E[e  (F  (x)  -  E  *(v)elvx)]  .  (3.3) 

p  q  p  q  v-k 


Because  of  the  fact  that  e  (F  (x))  is  nonlinear  in  F  (x) , . . . , 

p  q  q-p 

F^(x),  expression  (3.3)  cannot  be  simplified  further.  The  explic’.'- 

*  A  A 

form  obtained  in  Theorem  3.1  for  the  bias  of  . 9fc+l^  is  a 

consequence  of  G  being  linear  in  . Sk+1'  ia  i-^OTmatlve 

to  note,  however,  that  if  G*(F  (x),...,F  (x))  is  the  random 

q-p  q 

variable  obtained  by  replacing  $(j)  by  $(j)  in  the  definition  of 

e  (F  (x)),  we  have  (by  Theorem  3.1) 

P  q 

E[G*(F  n(x) . F  (x))]-  E  ^(v)eiv3t  -  e  (F  (x)  -  E  MvJe1”)  - 

q_P  q  v-k  P  q  v-k 


-Z  *(v)elvx  -f  ♦  (v)eivx  ...  -I  *(v)eivx 

v-q-p+1  v«q-p+2  v-q+1 

♦  (q-p+l)«1<‘J-p+1)x  ...  ♦Cq+l)e1C<’-1)x 

•  •  • 

•  •  • 

•  •  • 

♦  ♦  <,+l>e1(’+1>*  ...  $(q+p)e1(<,+p^x 

1  1  ...  1 

♦  (q-pfl)ei(q-p+1)x  *(q-p+2)ei(<»-p+2)x  ...  *(q+l)ei(q+1)x 

•  •  4 

•  4  4 

4  4  4 

*(q)elqx  *(q+l)ei(q+1)x  ...  |(q+p)ei(q+P)x 

Therefore  e^(F^(x))  may  be  regarded  as  estimating  a  random  variable 

G*  whose  bias  has  the  same  form  as  that  of  a  generalized  jackknife. 

As  has  been  pointed  out  previously,  the  ABMA  method  is 

especially  effective  in  approximating  functions  whose  Fourier 

coefficients  are  well  approximated  by  the  solution  of  a  linear, 

homogeneous,  difference  equation  with  constant  coefficients.  Of 

interest,  then,  is  the  bias  of  e  (F  (x))  under  the  assumption  that 

p  q 

the  density  f(*)  has  an  ABMA(p,q)  representation.  Under  this  assump¬ 
tion  we  have,  by  Theorem  2.2  , 

"  $(v)eivx  -  e  (F  (x))  . 
v-k  p  q 

Thus,  we  have  immediately  that 

Bias[e  (F  (x)) ]  -  E[e  (F  (x))  -  e  (F  (x))]  . 

p  q  p  q  p  q 
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By  Theorem  3.1  it  follows  that 


E[G**(F  (x) . F  (x) ) ]  -  Z  *(v)eivX 

q_P  q  v-k 

where 

G**(F  (x),...,F  (x>)  - 

q-p  q 


F  (x) 

q-p 


v«> 


*<,-p+2>«i(''-’’+2>x 


♦  (q+l)e1<q+1)* 


q-p+2 
Z  $(v)e 

v**q-p+l 


ivx 


q-p+3 
Z  $(v)e 
v»q-p+2 


ivx 


q+2 

£  *<v)«lvi 
v*q+l 


Z  Mv)elvx 
v»q-p+l 


q+1 

Z  $(v)e 
v*q-p+2 


ivx 


q+p 

Z  $(v)e 
v»q+l 


ivx 


1  1  ...  1 


q-p+2 
Z  <fr(v)e 
vq-p+1 


ivx 


q-p+3 
Z  <Kv)e 
v-q-p+2 


ivx 


q+2 

Z  <t>(v)e 
v»q+l 


ivx 


’  ♦W.iTX 

v»q-p+l 


q+1  ivx 
Z  $(v)e1VX 

v»q-p+2 


q+p 

£  4»  (v)  e 

v»q+l 


ivx 


*1.0^ 

|A2,0(x) 


(x)  be  Che  matrix  obtained  by  subtracting  the  (p-j)th 


Let  Ai,(J+l) 

row  of  A,  .  (x)  from  the  (p+l-j)th  row  of  A.  ,(x),  i  ■  1,2  and 

*->J 

j  -  0,1,..., p-2.  Then 


|A1,(P-1)W 

|A2,(p-l)(x) 


G*(F  (x) . F  (x)  )  ; 

q-p  q 


but,  by  a  basic  property  of  determinants,  we  also  have 


|Al,(p-l)(x) 

I^.Cp-l)00 


and  therefore 

G*(F  (x)  , . . .  ,F  (x))  -  G**(F  (x) . F  (x))  . 

q-p  q  q-p  q 

A 

As  noted  previously,  e  (F  (x))  estimates  the  random  variable  G*. 

P  <1 

Therefore,  under  the  assumption  that  f(*)  has  an  ABMA  (p,q)  repre- 

A 

sentation,  e  (F  (x))  is  seen  to  estimate  a  random  variable  G** 

P  <1 

which  is  constructed  by  the  generalized  jackknife  scheme  in  such 
a  way  that 

0» 

E{G**J  -  r  <fr(v)elvx  . 
v*k 

Although  it  is  not  possible  to  obtain  a  simple  expression 

A 

for  the  bias  of  e^(F^(x)),  the  following  observations  are  possible. 
Suppose  f(*)  has  an  ABMA  (p,q)  representation.  Then  the  bias  of 

A 

e  (F  (x))  is  a  result  of  the  error  inherent  in  the  estimation  of 
P  <1 

$(v),  |v|  -  1,2, . . . ,p+q.  This  source  of  bias  may  essentially  be 

removed  by  taking  a  large  enough  sample  size  n.  By  contrast,  the 

*  .  ®  .  __ 
bias  of  F, Cx),  a  logical  competitor  of  e  (F  (x)),  is  -  E  $(v)ex 
3  p  q  vj+i 
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regardless  of  Che  sample  size.  These  observations  provide  a  motiva¬ 
tion  for  considering  ASMA  estimators  as  a  possible  alternative 
to  Fourier  series  estimators. 

To  this  point  we  have  considered  only  the  bias  of  e  (F  (x)). 

P  <1 

In  concluding  this  section  we  note  that  the  bias  of  f  (x)  depends 

P»q 

only  upon  Bias[e  (F  (x))].  We  have 

P  <1 

Bias [ f  (x)]  -  E[f  (x) ]  -  f(x) 

p.q  p»q 

-  ~[l+2Real(E(ep(Fq(x)))-  E(FQ(x))}] 

-  ^[l+2Real(  ?  *(v)eivx  -  F  (x))] 

v*k 

-  £  Real  {E  (e  (F  (x)»-  F.(x)} 

it  p  q  U 

-  i  Real  {  I  *(v)eivx  -  FQ(x)} 

v»k 

■  ^  Real  {E(e  (F  (x)))  -  ”  <t>(v)eivx} 

*  P  q  v-k 

■  Real (Bias (e  (F  (x))]} 

ir  P  q 


3.4  Alternative  Ways  of  Expressing  f  (•) 

_ _ _ p»q 

In  this  section  some  different  ways  of  expressing  f  q(0 

are  derived  which  will  be  useful  in  later  chapters  and  also  show 

explicitly  how  AHMA  estimators  are  related  to  Fourier  series 

estimators.  The  basic  result  involves  using  the  alternative  form 

of  expressing  e  (A  )  referred  to  in  the  previous  section.  This 
n  m 


result  is  stated  in  the  following  theorem 
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Theorem  3.2 


e  (F  <x))  - 
P  q 


V!L^le±^q-l(x)  '  "  %elP^ _ (*> 


1  -  a,e 


lx 


-  a  e 
P 


ipx 


_az2j 


where  (a^a^t  •  •  •  *o  )  is  the  solutloa  of  the  system  of  equations 


<Kq) 

i(q-D 

m 

...  <Kq-p+l) 

•  .  -I 

<Kq+l) 

*(q+l) 

• 

<PCq) 

...  $(q-p+2) 

e 

z  - 

$(q+2) 

• 

• 

A 

♦(q+p-i) 

• 

A 

$(q+p-2) 

• 

• 

•  ••  $(q) 

• 

$(q+p) 

(3,4) 


Proof : 


e(P  (x)> 

p  q 


c _ (x)F_  _(x)  -He _ -  (x)F 


q-yfl(x) Fn-h+l  (x)+’  •  ^  Fn  <*> 


3=2 - gr£7-.„  .  q-rra  q-y»-i _ a—,  a. 

Vp(x)  Vp+i(x)  +  *•*  +  cq(x) 


where  the  cq_j(x)  are  cofactors  of  the  first  row  In  either  the 
numerator  or  denominator  determinant  of  ep(Fq(x)).  By  performing 
approriate  row  and  column  operations  within  these  cofactors  it 
Is  easily  verified  that 


a 

S(P„(X»  - 

?  q 

a0Fa(x) 

I  A  ^ 

"  ®le  XFa-l(x)“,',"atjelPXFa-D(x) 

(3.5) 

1 

*0 

-  a.eix  -  a  elp* 

1  P 

where 

<Kq) 

<Kq-l) 

...  <Kq-j+2)  $(q+l)  *(q-j) 

$(q-p+l) 

Mr 

♦  (q+l> 

• 

♦  (q) 

e 

...  $<q-j+3)  *(q+2)  <fr(q-j+l)  ... 

•  •  « 

A 

<Kq-p+2) 

• 

e 

A 

<Kq+p-l) 

• 

A 

<Kq+p-2) 

•  •  • 

•  •  • 

...  <Kq+p-j+l)  *(q+p)  ^(q+p-j-1) . . . 

• 

• 

<Kq) 
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j  ■  l,2,...,p  and 


<Kq)  <Kq-l) 

A  A 

<Kq+l)  $(q) 


♦(q-p+l) 
d>  (q-p+2) 


^(q+p-1)  $(q+p-2)  ...  ij>(q) 


a,  a 

It  follows  that  (  — — ,  . . . ,  — ^  )  is  the  Cramer’s  rule  solution 

a0  a0  a0 

to  the  system  in  (3.4).  By  dividing  numerator  and  denominator  of 
(3.5)  by  Sq  the  result  follows. 

By  the  previous  theorem  fp  ^(x)  may  be  expressed  as 

*  1  F  M-a.e1*^  i (x)-...-a  eipxF  (x) 

f-  -(x)  “  b  11  +  2Real(  “9 - - - £ 


p.q 


,  '  „ix 

A  -  a,e  -  . . .  -  u  e 


4  325 - F„(x))] 

ipx  0 


...  -  - 
1  P 

A  A 

The  results  of  Chapter  n,show  that,  if  a, . a  satisfy 

1  p 

condition  S,  f  (•)  satisfies 

p.q 


where 


_(v)  -  d>(v)  ,  |v|  -  0,1 . p+q 

P>M 


$  (v)  »  (x)  dx. 

p.q  p.q 


Using  this  fact  and  the  result  of  Theorem  3.2,  it  is  Informative 


to  re-express  f  (x)  as 

p.q 


K  «(x>  ■  h  U+2Real{e  (F  (x))-Fn(x)+  l  *(v)elvx-  2  *(v)eivx}] 
p.q  L*  p  q  0 


V*1 


v*l 


p+q. 


ivx  ^  iT 


-  jr  2  *(v)eiVA  +  “Real(e  (F  (x))  -  F  (x)) 

Tjm  —  p— q  "  "  r^M 
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'oWi 


-  ^Real 

IT 


If  ,\  lvx  *  \  ^vx  *  ipx^ 

Z  $(v)e  -a, e  Z  $(v)e  -...-a  e 


♦  (v) 
l+l 


s*VA-a  e~  Z  *(v)eiY*-...-a  eip*  Z  *(v)e" 
_ vg _ P  yq-p+1 

,  *  ix  *  ipx 

1  -  a, e  -  . . .  -  a  e  r 
1  P 


“  fO,p*,(x) 


1  1 
+  —Real 

1 1  — 


•  XA  A 

l-o,  e  -  . . .  -  o  e 
1  P 


v»q+l 


“  f0  .(x)  +  g_  _(x) 
OjP+9  P»^ 


(3.6) 


Expression  (3.6)  shows  £  (•)  to  be  Che  sum  of  a  Fourier  series 

P»9 

A  A 

estimator  f„  _ .  (•)  and  a  function  g  (•)  which,  under  condition 
0,p+q  P*q 

S,  has  the  Fourier  series  expansion 


Z  ^  (v)eivx 
ZlTH>p+q  P’q 


where  the  $  (v)  are  extrapolated  from  <Kv),  |v|  *  0,1, . . . ,p+q, 

P»<1 

A  A 

using  the  difference  equation  y(v)  -  a,y(v-l)-. . .-a  y (v-p)  »  0.' 

1  P 

We  note,  though,  that  (3.6)  is  valid  regardless  of  whether  or  not 

A 

condition  S  holds,  although  g  (•)  does  not  have  the  same  inter- 

P»9 

pretatlon  in  this  case.  The  validity  of  (3.6)  will  be  useful  in 

Chapter  V  when  we  consider  estimating  the  MISE  of  f  (*). 

P.9 

A  simple  example  which  illustrates  the  consequences  of  0.6) 

will  be  helpful  at  this  point.  Consider  the  ARMA  estimator  f,  (•). 

i.d 

By  (3.6),  we  have 

*  M  -  I  t~\  X  F a.eixi(q+l)ei(q+1)x  1 


fl,q(x)  "  f0,q+l(x> 


+  4eairV^(q+1) 

ir  - s — l 

L  1  -  v 
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;  .  Jkaa l 

A  *(q) 

If  Jo.  |  <  1,  — ~ - -r ■  •  Z  otYe^*  and  thus 

1  l-c^e  v-0 


a1eixi(q+l)e1(q+1)x 

l-^e1* 


I  [;.;(q+I)]^ei(v^+2)x 
v-0  x 


-  f  ta1<Kq+l)]aYWq“2eiVX 

v-q+2A  1 


Therefore, 


where 


f.  (x)  -  fn  .(x)  +  4r  £  (v)eivx 

l.q  0,q+l  2lr|v|>q+l1,q 


<frl  q(v)  -  ta^Cq+l)]  a^“q“2  ,  v  «  q+2,  q+3 


(3.7) 


Obviously  (v)  -  a.<J.  (v-1)  -  0  for  v  >  q  +  2,  but  we  also 

X*q  1  l*q 

have  (by  (3.7)) 


(q+2)  -  a. $(q+l) 
l.q  i 


fr,  „ (q+2)  -  a. $ (q+1)  -  0  . 
1*4  1 


This 


shows  expllcity  how  the  $.  (v) ,  v  -  q+2,  q+3,...,  are  extra- 

l.q 


polated  from  $(q)  and  $(q+l)  by  using  y(v)  -  a^y (v-1)  -  0. 

Suppose  now  that  in  the  above  case  condition  S  is  not 
satisfied,  i.e.  suppose  |a^|  >  1.  We  then  have 


i1*(q+l)e1(q+2)x 
— - 


1  -  djC' 


-*(q+l)ei(q+1)x 
*  - 


v-0 


i 


Using  this  expression  and  (3.6)  ic  follows  chat 

K  a(x)  "  n<°>  +  2Real(  E  „(v))I 

l,q  2tt  l,q  V«1  M 

where 

*l>q(°)  -  1  -  2Reair*(q+l)a"q_1] 

and  (3.8) 

!$(v)-$(q+l)aT”q  1-iKq+l)a1V  q  1  ,  v  »  l,...,q+l 
_ _ _ 

-<|>(q+l)a^V  q  1  ,  v  ■  q+2,  q+3 . 

It  is  easily  verified  that  <J>.  (v)  -  aT1*,  (v-1)  -  0  for  v  >  q. 

l»q  1  l,q  n 

A 

However,  by  (3.8)  ,  f..  (•)  does  not  integrate  to  1  and  does  not 

J-.q 

satisfy 

A  A 

♦i  _(v)  •  <Kv),  |v|  -  l,...,q  +  1. 

*•4 

A 

Therefore,  f,  (»)  is  not  as  easily  interpreted  in  this  case  as  it 
J.,q 

is  when  condition  S  is  satisfied.  It  should  be  pointed  out,  though, 

that  the  efficacy  of  f  (•)  as  an  estimate  of  f(«)  may  be  assessed 

p,q 


regardless  of  whether  or  not  condition  S  is  satisfied,  as  will  be 
shown  in  Chapter  v . 


3.5  The  Mixture  of  Densities  Having  ABMA  Representations 

As  pointed  out  in  Chapter  I,  Carmichael  approaches  the 
density  estimation  problem  by  using  autoregressive  schemes  to 
represent  the  density  f  C  * ) •  Under  fairly  mild  smoothness  condi¬ 
tions  on  f(*)  Carmichael  shows  that 


11m  f 
p-*~ 


p,0 


(x) 


f(x),  uniformly  in  x. 


(3.9) 


which  implies  the  existence  of  a  p^  (for  c  arbitrarily  small)  such  that 

|f(x)  -  f  _(x) |  <  e,  a.e.  This  result  provides  a  justl- 

Pq.U 

fication  for  using  autoregressive  representations  in  the  estimation 
of  probability  densities.  However,  it  also  leads  indirectly  to  a 
justification  for  considering  AKMA  representations.  In  order  to 
show  why  this  is  so  we  state  and  prove  the  following  theorem. 


Theorem  3 . 3  Let  f  (•)  (j  •  1,2)  be  a  probability  density 

Pj’qJ 

function  (defined  on  [-it, it])  having  the  ABMA  representation 


2lV 


ivx 


v*-q 


± 


and  let  0  <  Y  <  1. 

Then  the  mixture  density  yf  (•)  +  (l-y)f  (•) 

pl’ql  p2’q2 

has  an  ABMA(p^  +  Pj.k)  representation,  where  k  <_  maxfaj+Pj*^4’^ * 
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Proof:  Let  |l-a^eix  -  ... 


Then, 


yf  .  (x>  +  (l-Y)f  (x) 
pl,ql  p2*q2 


ajp^eipjX|2  »  a^x),  j  -  1,2. 


Y  I1  0.  eivx 
v— q^lv 

ajCx) 

Cl-Y)E262veivx 

v— q, 

+  - - - 

a2(x) 

Ya2(x)  E1  0lveivx+(l-Y)a1(x)  E2  B^e 


v— q. 


v— q„ 


a^xjaj  (x) 


The  denominator  Is  obviously  of  the  form 


H-a.e1’-  ...  -  a  .  e1'"!^5!2  . 

1  Pl+P2 


In  addition,  since  a  (x)  may  be  expressed  as  E^b  eivx,the  numerator 

J  v—  p  Jv 

k  ,  J 

xV3C 

is  of  the  form  E  Be  where  k  does  not  exceed  max(q  +p  ,q  +p  ) . 

v— k  v  l  i  i  i 

The  result  thus  follows. 


By  induction  a  similar  result  follows  for  the  mixture  of 
m  ABMA  densities  (m  3) . 

A  special  case  of  Theorem  3.3  which  is  of  interest  is  the 
mixture  of  autoregressive  densities.  The  mixture  of  f  rt(*)  and 
fp  q(*)  is,  by  Theorem  3.3,  an  ARHA  (p^  +  p2»k)  density  where 
k  <_  max  (p^,p2)  and,  in  general,  k  >  0.  Carmichael's  result,  (3.9) 
and  Theorem  3.3  are  thus  seen  to  provide  a  strong  motivation  for 


ivx 
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ABMA  representations  in  situations  where  f (•)  arises  as  the 
mixture  of  densities.  Of  course  (3.9)  shows  that  even  the 
mixture  of  densities  may  be  well  approximated  by  an  auto¬ 
regressive  scheme.  However,  the  AFMA  (pj+Pj*^  representation 
will  necessarily  be  more  parsimonious  than  a  satisfactory  auto¬ 
regressive  representation.  This  is  important  in  the  stochastic 
setting  where  fitting  too  many  parameters  is  to  be  avoided. 
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CHAPTER  IV 

LARGE  SAMPLE  PROPERTIES  OF  ARMA  DENSITY  ESTIMATORS 


4.1  Introduction 

We  continue  our  study  of  ARMA  density  estimators  by 

establishing  some  of  their  large  sample  properties.  In  Theorem 

4.1  conditions  are  stated  under  which  f  (•)  converges  in  prob- 

P.9 

ability  to  f(*>,  where  p  remains  fixed  and  q  tends  to  infinity 
at  a  specified  rate  with  the  sample  size  n.  The  results  of 
Section  4.3  are  the  stochastic  analogs  of  some  of  the  work  of 
McWilliams  (1969)  involving  the  e^-transform.  Sufficient  condi¬ 
tions,  which  are  more  informative  than  those  in  Theorem  4.1,  are 
established  for  Che  convergence  in  probability  of  f^  (•)  to  f (•) 
(as  q  and  n  tend  to  infinity).  More  importantly,  f.  (•)  is 
shown  to  possess  a  certain  optimality  property  for  densities 
satisfying 


lim 

v+® 


<t(v+l) 

$(v) 


R. 


Finally,  we  point  out  that  higher  order  (p  >  2)  results  paralleling 

A 

those  for  f^  (•)  are  undoubtedly  obtainable. 

4.2  Conditions  for  the  Consistency  of  f  (•) 
_ Ri9 _ 

A  minimal  requirement  of  any  density  estimator  is  its  conver- 

gency  (in  some  sense)  to  the  true  density  function  as  the  sample 

size  tends  to  infinity.  Considerable  attention  in  the  literature 

has  been  focused  upon  establishing  some  form  of  consistency  for 


! 
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various  types  of  density  estimators.  The  mean  square  error  con¬ 
sistency  of  Fourier  series  estimators  has  already  been  indicated 
in  Chapter  I  .  Parzen  (1962)  has  proven  that  for  suitably  chosen 
weighting  functions  K( •) ,  the  kernel  density  estimator 


is  mean  square  error  consistent  for  f(x)  if  h  »  h(n)  satisfies  lim  h(n)=0  and 

n-*°° 

lim  nh(n)  ■  ®.  Other  estimators  and  different  convergence  criteria 
tr*» 

have  also  been  considered  (see  Tapia  and  Thompson  (1978)). 

In  the  following  theorem  conditions  are  stated  under  which 

f  (•)  converges  in  probability  to  f(*). 

P»4 

Theorem  4-1  Suppose  f(-)  is  a  density  defined  on  [-ir,ir]  which 
is  continuous  and  of  bounded  variation  on  that  interval.  Based 
on  the  random  sample  X^,...,Xq  from  f(*)t  let  ajP,q^ (j-1,2, . . . ,p) 
be  the  solution  of  the  system  of  equations  in  (3.4).  For  a  fixed 
p  ^  1  and  q(n)  *  o(/n)  (where  lim  q(n)  -  ®),  suppose 

~(p,q)p+q  |$(v)| 

P“llm  _  j  v»p+q-1-H _  ■  0,  j-l,...,p,  (4.1) 

n-w»  2 

Ptq 

where  z  -  »ln|l-n1<!’’,,e11-. .  .-n(p,<,)eI|”'| .  Then 
*.[-«, i>]1  " 


p-lim  f  (x)  -  f(x)  for  all  x  e  . 

tr~>  P*q 


Proof:  Using  the  relationship  established  in  Chapter  III  we  have 

(assuming  the  existence  of  each  limit) 


p-lim  f  (x) 

Q-*-  m  P,q 


p-lim  f.  _(x)  +  p-lim  g  (x) 

o.p+q  p.q 
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Since  f(«)  is  continuous  and  of  bounded  variation  on  [-ir,ir]t  f(x) 
has  a  Fourier  series  representation,  and  thus 
Bia.vff„ 


‘O.pV’01  '^m  ^ 

|v|  >p+q 


♦  (V).1"  , 


(4.2) 


which  tends  to  zero  as  q  ♦  «.  We  have  also 


p+q- 


iVX. 


var[f.  (x)]  -  —n  var[Real(  2  $(v)e  )] 

O.P+q  irz  v-i 


p+q- 


ivx. 


P+q 


<_  var[v21^(v)e  ]  •  ^EjVarMv)) 
If  IT 


T  p+q  - 
+  — 2  2  2  cov($(v) ,^(k))e 
ir  v«l  kf^v 


i(v-k)x 


K  i  V(l-|*(v)|  ) 

IT  V"1 


p+q 


l  l 


p+q 


+  —=■—2  2  ($ (v-k)  (v) $  C-k)  )  e 

v  n  v«l  k}*v 


i(v-k)x 


<  JLf  2+9L  +  21p+SL>  (2+^L-i.),] 

—  2l  n  " 


If  p  is  fixed  and  q  »  o(/n)  it  thus  follows  that 
lim  var[fn  .  (x)]  *  0. 

o.p+q 


By  (*.2)  and  (A. 3)  we  have  (for  q  ■  o(/n)  and  unbounded) 


limE[fn  (x)  -  f(x)]  -  0, 

n-*.  °»P^ 


and  thus 


(4.3) 


P"llm  f0,p+q(x)  “  f(x)' 


We  must  now  show  p-lim  g  (x)  ■  0  in  order  to  prove  the  theorem. 

p»q 
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Recall  that 


gn  (x)  -  heal[  E  B.(p,q)(x)]  , 

p.q  *  iml  i 


where 


ZM 


ivx 


6<p.q)x>  _ _ 


Since 

^•">1  ?  |  iw 

3  1  v-p-Ki-l+l1 _ 

2 

p.q 

bounds  |sjp,q^(x)|,  it  £ollows  (from  condition  (4.1))  that 

p-lim  g.(p’q)(x)  ■  0.  Therefore,  p-lim  E  g^p,<1^(x)  -  0  and 
n-*»  3  *  xr*»  j-1  3 

consequently  p-lim  g  (x)  *  0.  Since  x  was  chosen  arbitrarily 
n-*»  p»q 

the  result  follows. 

Several  comments  are  in  order  regarding  Theorem  4.1.  First, 
it  should  be  pointed  out  that,  since  the  method  of  estimating  $(v) 
is  fixed,  condition  (4.1)  is  implicitly  a  condition  on  the  Fourier 
coefficients  of  f(*).  In  this  work,  however,  the  problem  of  trans¬ 
lating  (4.1)  into  explicit  conditions  on  the  sequence  ($(v)}  has 
not  been  solved.  It  is  hoped  that  a  satisfactory  solution  to  this 
problem  may  be  obtained  after  future  research.  For  the  present, 
though,  we  note  that  the  importance  of  Theorem  4.1  lies  in  the 
fact  that  it  points  out  where  the  difficulty  rests  in  inducing 

A  A 

convergence  from  f  (•).  Since  fn  (x)  is  consistent  for 

p.q  o.p+q 

f(x) ,  it  is  clear  that  conditions  need  only  be  established  to 
Insure  that  p-lim  g  (x)  ■  0. 

n  p.q 


i 

i 


Although  ve  will  not  be  able  to  substitute  conditions  for 


(4.1)  which  are  as  explicit  as  desired,  the  following  observations 
make  (4.1)  more  palatable.  We  have 

p+q  «  p+q 

Z  |<Kv)|  •  Z  |<fr(v)  -  $(v)  +  <fr(v)| 

V™p+q-j+l  v«p+q-j+l 


a 

<  I  U(v)-$(v)| 
v»p+q-j+l 


p+q  , 

+  i  U(v)| 
v*p+q-j+l 


which  converges  in  probability  to  zero  (as  n  +  •)  by  an  argument 
similar  to  that  in  Theorem  4.1.  In  addition,  it  is  easily  verified 
that 


p+q 
Z 

v*p+q-j+l 


|<fr(v)-<fr(v)|  -  0./4r] 


Pvn 


and  thus  a  set  of  conditions  which  may  replace  condition  (4.1)  is: 

k(p*q)| 


(i) 


p.q 


op(i^n)  ,  (J  ■  1,2, . . . ,p) 


p+q  . 

(ii)  2  i<f(v)|  -  0(7*-)  • 

v*q+l  Vn  ' 


Although  it  is  still  not  precisely  clear  for  which  densities  (1) 
and  (il)  are  valid,  this  set  of  conditions  is  somewhat  more  informa¬ 
tive  than  condition  (4.1). 

It  is  Important  to  note  at  this  point  that  one  member  of  the 

a 

ABMA  class,  fn  (•),  is  mean  square  error  consistent  for  f(*)  under 
the  single  condition  that  f(*)  have  a  Fourier  series  representation 
for  all  x  e  [-ir,ir].  This  fact  was  proven  in  Theorem  4.1.  Because 

A 

of  the  consistency  of  f.  (•),  Theorem  4.1  would  not  be  extremely 

”,  q 
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Important  unless  it  could  be  shown  that  £  (*)  (for  p  >  1)  in 

p»9  “ 

A 

some  sense  converges  more  rapidly  to  f ( •)  than  does  fn  (•)• 

0,q 

Establishing  conditions  which  Insure  the  more  rapid  convergence 


of  f  (•)  proves  to  be  quite  difficult  in  general.  However, 
P*9 

in  the  next  section  we  consider  the  special  case  of  f,  (•)  and 
obtain  some  quite  satisfying  results. 


4.3  Large  Sample  Results  Involving  f.  (•) 

_ 1**1 


Given  a  complex-valued  sequence  of  partial  sums  (A^.A^^, . . .} 

which  converges  to  A^,  McWilliams  (1969)  has  established  conditions 

under  which  the  following  results  hold: 

A  -  e.(A  ) 

lim  e  (A  )  -  A  (n-1,2) ,  lim  — - - - -  0, 

—  n  m  —  A--Vi 


A  •  (A  .«) 

11m  - *  0  (for  any  j). 

—  *-  -  Vi 


The  theorems  in  the  present  section  Involve  e, (F  (x))  and  are 

1  9 

the  stochastic  analogs  of  the  above  results  involving  e. (A  ) . 
Essentially  the  same  results  are  obtained  with  lim  replaced  by 
p-lim. 

Vel  (V 

A  sufficient  condition  for  lim  — - — - -  0  is 

nr*”  ”  nrt-1 


a 

lim  — - —  -  R,  0  <  |R|  <  1 
a  ho  *a-l 


(4.4) 


where  a  "A  -  A  . .  Recalling  Theorem  2.2  it  is  seen  that , 
b  m  tn-i. 

if 

a 

—2 - -  R  for  m  >  m„, 

a_  ,  O’ 
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then 


A  -  e.  (A  ) 

*  1  HI 


0  for  m  >_  mr 


Condition  (4.4  is  thus  seen  to  be  a  relaxing  of  the  condition 
needed  for  e. (A  )  to  be  exact,  with  the  result  being  that  e, (A  ) 

1  01  X  ID 

converges  more  rapidly  than  A^^. 

A  MM* 

In  the  setting  of  interest  here  we  have  a  *  $(m)e 

in 

and  am  „  eix  .  In  this  case,  then, a  condition  equi- 

Vi 

valent  to  (4.4)  is 

llm  *(ST)  “  *  ♦  0  <  1*1  <  1  •  (4.5) 


If  (4.5)  holds  we  have 

F „(x)  -  e.  (F  (x)) 

Urn  — - t-S - 

»*•  F— (x)  -  F^W 


-  0  , 


Avx 

where  -  I  <f>(v)e  .  This  property  suggests  that,  under 

*  v»l 

(4.5),  e^(Fffi(x) )  might  converge  more  rapidly  in  some  stochastic 

sense  than  does  F^  ^(x) .  This  possibility  will  be  investigated 

later,  but  first  it  is  necessary  to  verify  that  e, (F  (x))  does 

x  to 

Indeed  converge  to  F<B(x)  under  a  condition  similar  to  (4.5). 

The  verification  of  this  fact  Is  the  subject  of  the  next  theorem. 


Theorem  4 .2  Suppose  f(*)  is  a  probability  density  function 
defined  on  [-ir,ir]  and  that  {(•)  has  a  Fourier  series  representa¬ 
tion  for  all  x  in  that  Interval.  Further,  suppose  chat  (4.5)  holds 

■» 

and  that  •  0(1)  (as  n  -*•  ») .  If  f^,  (•)  is  based  on  a 
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random  sample  X^,...,Xq  from  f(«),  and  q  ■  o (In  n)  (with  q 
unbounded) ,  then 

A 

P-iin  f,  „<*)  "  for  all  *  ef-ir,*]  . 

tr*" 

Proof:  Since  f.  (x)  ■  -iy-[l  +  2Real(e, (F  (x)))], 

1,  q  L w  X  q 

f(x)  «  -|^[1  +  2Real(Foo(x))] , 

and 

p-lim  Real  (Z  )  ■  Real [p-lim  Z  ]  , 

n-*»  n  n-*“  n 

it  is  sufficient  to  show  that 

p-lim  e.(F  (x))  -  F—(x) . 

n  KB  A  ^ 


Observe  that 

e. (F  (x)) 
1  <1 


V*> '  .* 


1  -  a 


(*r 


*(q)elqx  +  Fq_1(x)[l-a^etX1 


1  -  a.  »e 

<q) 


ix 


i  ,(x)  +  ^ 

Q~X  ,  _ 


iqx 


1-a,  .  e 

(q) 


jj(q+l) 
♦  Cq) 


Since  q  ■  o(£n  n)  it  follows  from  the  proof  of  Theorem  4.1  that 

A 

p-llm  P  ,(x)  "  P«,(x)»  Thus,  if 
tr*«  q” 


p-lim 

n^» 


♦<q>«lq* 

A 

1-a.  .e 

(q) 


ix 


»  0 


the  result  is  proven.  We  have 


p-lim  $(q) 
n-*» 


-  lim  *(q)  +  p-lla,(^(q)  -  *(q)) 


n •*» 


-  p-lim  ($(q)  -  $(q)). 


a 

Since  E[^(q)  -  <Kq)]  -  0  for  each  q,  and  varfiKq)  -  <j>(q)]  - 
1  2 

-(l-f<Kq)|  )+flasn+»,  it  follows  that  p-lim(*(q)  -  $(«))  .  o 

*  -  rM“ 

and  consequently  p-lim  *(q)  -  0.  A a  |*(q)eiqx|  -  U(q)L  we 
also  have 

p-lim|*(q)eiqx|  -  0.  6) 


Now  consider  . 

~  t(a)  +  eTST  (♦(q+D-'Kq+D) 


1  +  T?qT  ^q>  "  *^q^ 

By  the  above  *(q+J)  -  Kq+j)  -  0p(-^)  (j  -  0,1),  and  thus  if 

"  °  WC  haVe  «<q)  “  *•  N°"» 


lim  J77)  *  n*--  lim  *L  .  JL- 

tt+»  q  n  n-KD  $(q)  Rq,/2”  * 

Rq 

By  hypothesis  I  is  bounded,  and  so  it  is  sufficient  to  show 


11111  l~d  *  lim  -  -  0  (r 
n-*°  Rqv^T  n-x»  rq^n 


1*1). 


Now,  InCr'^inr  +±  In  n  -  In  n  +  |}  »  -  as  n  -  3ince 

q  “  o(Zn  n) ,  Since  £)t(rqvn)  ■**  we  have  rq^n  •,  and  thus 
1 


lim  — —  »  0 

tt-x» 


As  stated  above  this  implies  that  p-lim  a.  -  R 

(q) 


and  consequently  p-lim  l-o  ^-l-Re^O.  Along  with  (4.6),  it 


then  follows  that 

A. 

p-lim  $(q)e 


iq* 


l-o 


(q)  * 


ix 


0,  and  the  proof  is  complete. 


The  assumption  chat  q  tends  to 


at  a  rate  slower  than 
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In  a  is  undoubtedly  more  severe  than  is  needed  to  induce  f.  (*) 

1><1 

to  converge.  From  the  above  proof  it  is  clear  that  if 

p-lim  (1-n.  .eix)  -  Z(x) , 
n-*» 

where  Z(x)  satisfies  P[Z(x)i&0]*l,  then  the  result  of  Theorem  4.2 

follows.  However,  the  assumption  that  q-o(£n  n)  proves  to  be 

advantageous  since  this  assumption  will  be  necessary  in  order  to 

prove  subsequent  more  rapid  convergence  results. 

In  the  next  theorem  we  establish  a  more  rapid  convergence 

property  of  e, (F  (x)).  This  result  and  its  proof  closely  parallel 
i  q 

the  result  and  proof  of  McWilliams  in  the  deterministic  setting. 

Theorem  4.3  Under  the  conditions  of  Theorem  4.2  we  have 

F  Jx)  -  e.(F  (x)) 

p-lim  — - M -  -  0. 

tr*»  F^xJ-F^x) 


Therefore 


F  (x)  -  e.(F  (x)) 

p-lim  — - M - 

F^/x)  -  F^Cx) 


•  1  - 


*-l  1/  Kq+2)x 

-S— —  p-lm  .W^J _ 

ix  K 


1-Re  n-*” 


-  Vi(I) 

4>(q+2) 


-  1- 


p-lim 


l-Re**  n-*» 


$(q+2) 


[^(q+2)ei(q+2)xr1[F(i+1(x)-Fq+1(x)+“  *(v)eivx] 


v»q+2 


Using  arguments  similar  to  that  in  Theorem  4.2,  it  is  easily  verified 
that 

P'llm  •  1  “d  p-lim  tP_i(x)  -  F^(x)]  -  0. 


n-*°° 


tr*® 


♦(q)  1  q+lV  ^  q+1' 


In  addition,  McWilliams  has  shown  that 

lim  [<Kq+2)ei(q+Z)xr1  Z  t(v)eivx  -  — i- 


tr*® 


v-q+2 


1-Re 


$(m) 


under  the  assumption  that  lim  *  R.  It  therefore 


follows  that 


p-lia  *.<«>  -  «!»,(»)) 

F  <«>  -  Vi'*’ 


1  “ 


1  -  Re 


ix 


1-Re 


ix 


-  0. 


Two  points  should  be  made  about  the  result  in  Theorem  4.3. 
First,  the  theorem  should  be  regarded  as  an  optimality  property 
of  f,  (.)  but  not  as  proof  that  f.  (.)  converges  more  rapidly 

l»q 

A 

to  f(.)  than  fn  .,(•).  In  order  to  prove  this  result  it  is 
U,q+l 


necessary  to  show 

Real(FM(x)  -  e  (F  (x))) 

p-lim - - — A-  -  —  “  0  , 

tr*»  ReaKF^Cx)  -  Fq+1C*)) 

which  of  couTse  is  not  an  Immediate  consequence  of  Theorem  4.3. 
However,  since  f(x)  is  completely  determined  by  Tm(x) ,  an  estimator 
of  F^Cx)  which  has  good  properties  is  of  considerable  importance. 

The  second  point  to  be  made  regarding  Theorem  4.3  involves  the 

A 

rate  at  which  q  -*•  *.  In  order  to  insure  the  convergence  of  f.  (•) 

•*’>4 

it  was  seen  in  Theorem  4.2  that  we  must  have  q  *  o(in  n) .  However, 

A  A 

restricting  q  in  this  way  in  a  comparison  of  e^(F^(x))  and  F^+1(x) 

A 

is,  in  a  sense,  unfair,  since  F^Cx)  is  consistent  for  F^Cx)  even 

when  m  ■  o(Wii).  Ideally,  a  comparison  of  the  rate  of  convergence 

of  e, (F  (x))  and  F  (x)  should  be  made  with  q  »  o(Zn  n)  and  m  »  o(/n). 
1  q  m 

By  modifying  the  conditions  of  Theorem  it  is  possible  to  show 
that 

F„(x)  -  e.  (F  (x) ) 

p-lim  — - M -  -  0  , 

nr*”  F  (x)  -  F  (x) 

OO  JJJ 


where  q  -  o(Zn  n)  and  m  »  oC^).  This  fact  will  be  proven  in 
Theorem  4.4. 

Before  moving  to  Theorem  4-5  we  shall  examine  the  effect 
which  an  assumption  like  (4.5)  has  on  var[Fffl(x) ] .  This  is  impor¬ 
tant,  as  the  rate  at  which  F  (x)  converges  to  F  (x)  is  directly 
affected  by  var[Fa(x)].  Now,  since  converge  to  R 

any  faster  than  in  the  case  where 


»(m)  _ 


$(n-l) 


0  ’ 


R  for  m  >  m. 


0.7) 
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we  will  assume  (4.7)  and  then  calculate  var[F  (x)].  Under  (4.7) 

is 

we  have 

$(m)  -  R$(m-1)  *  0  for  m  >  m^, 

lr 

which  Implies  that  ^(m^+k)  -  ^(Oq)R  ,  k  -  1,2,...  .  Using  the 

formula  for  var{F  (x)] obtained  in  Theorem  4.1,  we  have 
m 

vartF  (x)]  --I  (l-!$(v)|2)  +  ^  l  ($(v-k)-<KvH(-k))ei(v~k)x 
m  v-1  \-l  v-k+1 


+  ^  Z  Z  (^(v-k)-<^(v)<t.(-k))ei(v'k)x 
v-1  k»v+l 


The  first  term  in  this  expression  is  0(^)  regardless  of  what 
assumption  is  made  about  4>(v),  and  thus  only  the  convar lance 
terms  need  to  be  considered  in  investigating  the  rate  of  con¬ 
vergence  of  var[F  (x)].  The  second  covariance  term  is  simply 
m 

the  complex  conjugate  of  the  first,  and  so  we  consider  only  the 
first  term.  Under  assumption  (4.7)  this  term  is  (for  m  >  mQ  +  1) 


1.  I“^1  “  <(l(v-k)ei^V-k^X  —  ”0w  .  /-N.lvx 

n  k-1  v-k+1  “  k-1  v-k+1 


--  zV-k)^  Z  $  (v)eiVX  (4.8) 


|  <J>  Cm0) 


m-1 
-  Z 
k-mQ+l 


k-m. 


(R) 


0  -ikx 
e 


m 
Z 

v-k+1 


RV'm0, 


ivx 


It  is  easily  verified  that  the  last  term  in  (4.8)  is 

!♦<»,))  |*  | Rl 2Re3j  (l-|R|2(-m0~1)  (Relx)m~m0~1[l-(R«~^x>11|-ll>0~1] 

°  i-ReiX  |  1-  |r|2  1  -  Re"1* 

The  second  term  in  (4.8)  is 


g 
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also  0(— ),  this  implies  that  varlF  (x)]  ■  0(— ) .  Under  condition 
n  m  n 

(4.7)  it  thus  follows  that  the  truncation  point  m  may  be  allowed 
to  become  large  more  quickly  than  generally  stated.  Specifically, 
F  (x)  is  consistent  for  F(x)  so  long  as  m  *  o(n). 

d  00 

We  are  now  in  a  position  to  establish  our  most  important 
large  sample  result  involving  e^F^Cx)). 


Theorem  4 . 4  Under  the  conditions  of  Theorem  4.2  and  the 

additional  assumptions  that  -  ^  *  O(t^m)  and 


!  -  -f  £  %±£-elv3M  -  o  R*' 


*(q-l) 


\v-0 


<Kq) 


)  (4-’> 


we  have 


p-lim  e^(Fq(x))  „  q  for  x  e  [— it  ,  ir  ]  , 


n-*«* 


F  (x)  -  F  (x) 
00  m 


CK  1 

where  m»[n],0<a<-^. 


F.(*)  -  e.(F  <x)) 

proof.  - i__a - 

*.<*>  -  \M 


4>(q)e 


iqx 


KM  -  F^Cx)  -  l-a(q)e 
KM  -  Fn(x) 


ix 


4>(q)e 


iqx 


Fq~l(x)"Vl(xH4qKv)elVX~  ^(q)6^ 


F  (x)  -  F  (x)  +  £  <Kv)e 
ffl  m  v»m+l 


ivx 


»(q) 


(F  (x)  -  F  (x))  +  7E 

/ST  “  m 


_ _ _ SSL— — _ 

.  (  ±(m+l)x  “  4(aH-l+v)_  ivx 

+  v»0  TCS+T)  e 


(4.10) 
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We  have 


11m  ~  KnH-l)  -  lim  R^1  -  0 

tt**  i^m  n-H“  /in  R 


since 

— ■  0(Vm)  and  m  •  [na]  . 
Rarfl 


Also  recall  that 


lim  £ 

nr-*»  v-0 


4>  (nt+l+v)  _ivx 

#(nH-l)  6 


1 


1-Re 


ix 


The  previous  considerations  concerning  var[F  (x)]  indicate  that 

m 

one  of  the  following  holds 


varHKF  (x)  -  F  (x))]  -  c.  +  0 
j—  mm  1 

*m 


or 


yC" 

var[— =-(F  (x)  -  F  (x))]  ®  as  n  +  ». 


It  thus  follows  that  the  denominator  of  (4.10)  converges  in  prob¬ 
ability  to  a  random  variable  Z  (x)  which  satisfies  P[z(x)  i  0]  «  1. 
Therefore,  in  order  to  prove  the  result  it  is  sufficient  to  show 
that  the  numerator  of  (4.10)  converges  in  probability  to  zero. 
Clearly  (since  q  mo(Zn  n)  and  m  ■  [na])  the  first  term  of  the 
numerator  goes  to  zero  in  probability  as  n  +  ».  The  second  term 

is  (with  a  (x)  ■  £  eivx) 

*  v-0  *(q) 


♦(O.1”' 
m 


Lx)  -  -jSU.  \ 


I 


1 
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The  denominator  of  the  expression  in  brackets  converges  In  prob¬ 
ability  to  (l-Re*X) .  The  numerator  may  be  expressed  as 


aq(x) 


4>(q+l)  ix 

?(q)  S 


p  >n 

<Kq) 


a-.1*) 


]-(1  + 


0  1: 


p -/n  ' 

$(q) 


Y 


The  limit  in  probability  of  the  numerator  of  (A. IQ)  Is  thus 

-L-r  P-a.  ^Awa^-!- 
l-Re1X  P  VS  q  *Cq) 


1  li»  —  4>(q)elqx[a  (x)(l  -  eix)  -  1]  .  (4.11) 


1-Reix  nr*0"  </m 


<t>(q) 


Clearly  p-lim  -~-0  0,  and,  as  noted  in  Theorem  4.2, 

tr*»  m  p  G 

0_(T|r) 

p-lim  ~~r  -r —  ”  0  and  lim  a  (x) 

<Hq)  n-w  q 


1-Re 


ix 


Thus,  (4.11)  is  simply 


-  1  T-  ,  an  ^*(q).1'"'[(l-ii3±li.U) 
(l-E.tX)2  «■  ♦<«) 


a~1(x)]  , 

q 


which  by  hypothesis  is  zero,  and  the  proof  is  complete. 

In  light  of  previous  considerations  we  point  out  that,  if 
instead  of  assuming  (4.9)  we  assume  that  (4.7)  holds  and  m  »  [na], 
0  <  a  <  1,  then  the  preceding  proof  remains  valid. 
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Assumption  (4.9),  which  Is  crucial  to  the  proof  of 
Theorem  4.4  ,  has  to  do  with  how  quickly  the  ratio  |(q)^ 
approaches  R.  Note  that  since 


^  ■  »• (i  -  ^  '1X)  -  *;1  «  *  ° 


as  n  -*■  However,  in  order  for  (4.9)  to  be  satisfied  ---4^-  must 

•Kq)  _ 

converge  to  R  rapidly  enough  to  compensate  for  the  fact  that - |Rq|-+~. 

$  (o+l)  ^ 

The  convergence  of  to  R  is  the  more  rapid  of  the  two,  and 

(4.9)  is  obviously  satisfied,  in  the  case  where  (4.7)  holds.  Con¬ 
dition  (4.9)  may  thus  be  regarded  as  an  indication  of  how  far  the 
Fourier  coefficients  of  f(*)  may  depart  from  the  model  "if (m)-R<f (m-l)»0 
for  m  >  a>  "  while  still  maintaining  the  property 

W  A 


p-lim 

n-**> 


F '„(*)  -  ex(F  (x)) 


F  (x)  -  F  (x) 
•  m 


We  would  indeed  be  remiss  if  the  current  section  was  concluded 

without  a  discussion  of  how  the  previous  results  involving  f ,  (•) 

i»9 

should  properly  fit  into  a  general  approach  to  density  estimation. 

A 

First,  if  the  Fourier  series  estimator  fn  (•)  is  employed  to  estimate  f(*) 

Uytn 

then  the  results  of  this  section  provide  a  justification  for  at  least 

A  A 

considering  f,  (•)  as  an  alternative  to  f.  (•).  The  nonparametrlc 
i,q  u,m 

nature  of  the  density  estimation  problem  does  not  allow  us  to  make 
specific  assumptions  (such  as  those  in  Theorems  4.2  and  4.4)  about 
the  underlying  density,  but  this  fact  should  not  blind  us  from  the 
realization  that  one  estimator  may  perform  better  than  another  in 
certain  situations.  In  Theorem  4.4  conditions  were  established 

A 

under  which  £.  (•)  would  reasonably  be  expected  to  perform  better 

it*l 


than  fn  (•)•  With  these  thoughts  In  mind,  the  only  question 
u,m 

which  remains  is  the  following.  For  a  given  data  set,  how  does 
one  recognize  if  the  situation  calls  for  the  use  of  f.  (•)  (for 

A 

some  q)  rather  than  f.  (•)?  Two  possible  answers  to  this 

u,m 

question  will  be  offered  in  Chapter  V . 


4.4 


Extending  Results  Involving 


f,  (•)  to  Higher  Orders 
l»q _  _ 


As  mentioned  in  the  previous  section,  McWilliams  has  invest! 
gated  certain  properties  of  the  e2“transform.  Since 

f2  qOO  -  [1  +  2Real(e^Fq(x)>]  (for  q  >.  2) , 

A 

some  insight  into  when  f-  (•)  may  be  of  value  as  an  estimator  of 
f(*>  can  be  gained  by  considering  the  theorems  of  McWilliams 

involving  e.(A  ).  The  following  two  theorems,  stated  without  proof 

z  a 

have  been  proven  by  McWilliams  (1969) . 


Theorem  4.5  If  A  -*■  A_  , 

1  1  1  ■  1  m  ®  a 


m+1 


m 


*  R  -*•  R  t  1,  and 
in 


R.  i  -  R 

lim  - 2—  -  Q  i  R,  then  e  (A  )  -  A  . 

nr-  Rnrt-2  "  Rm+1  2  m 


Theorem  4.6  if  the  conditions  of  Theorem  4.5  are  satisfied,  and 
if  further  R  i*  0  and 


-  R  , 


~  a2(AnH-2) 


A_  -  A 


art-j 


-  0 


then 


11m 


for  any  j  . 


Using  the  results  of  Theorems  4.5  and  4.6  and  proceeding  as  in 

Section 4.3  one  could  undoubtedly  establish  results  for  e„(F  (x)) 

2  q 

paralleling  those  of  the  previous  section.  The  proofs  of  these 

results  would  be  somewhat  more  tedious  than  those  in  the  first 

order  case  (due  to  the  increased  complexity  of  the  e^-transf orm) , 

and  perhaps  not  altogether  necessary.  It  seems  that  the  proven 

worth  of  e.(A  )  (as  evidenced  in  Theorems  4.5  and  4.6)  in  the 
4  in 

deterministic  setting  is  alone  a  motivation  for  considering 

f.  (•)  to  be,  in  certain  situations,  a  viable  competitor  of 
2*q 

the  Fourier  series  estimator. 

Except  for  the  situation  in  which  {a  }  e  L(n,A) ,  conditions 

in 

insuring  the  convergence  of  e  (A  )  and  the  more  rapid  convergence 

n  m 

of  e  (A  )  than  A  ,  have  not  been  established  for  the  cases  where 
xx  m  n-ra 

n  3.  However,  the  Importance  of  the  exactness  result  obtained 
in  Theorem  2.2  should  not  be  overlooked.  Even  when  the  assumption 
{a  >eL(n,A)  is  only  approximately  satisfied,  e  (A  )  can  be  expected 

XU  u  HI 

to  be  of  considerable  value.  This  fact  was  demonstrated  in  the 
examples  of  Chapter  II .  Likewise  using  Theorem  2 . 2  as  a  justifi¬ 
cation  for  considering  f  (*)  to  be  a  candidate  estimator  of  f(*), 

P.q 

there  remains  the  problem  of  how  to  select  p  and  q.  This  problem 
is  the  subject  of  the  next  chapter. 
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CHAPTER  V 

THE  PROBLEM  OF  SELECTING  p  AND  q 
5.1  Introduction 

Up  to  this  point,  the  primary  objective  of  this  work  has 
been  to  illustrate  and  discuss  the  various  reasons  why  the  class 
of  ABMA  estimators  are  of  value  in  the  density  estimation  problem. 
Armed  with  a  suitable  class  of  estimators,  we  are  left,  however, 
with  the  practical  problem  of  choosing  an  appropriate  estimator 
(based  on  data  X^,...,Xq)  from  this  class.  Given  a  realization 
from  f(*),  the  class  of  ABMA  estimates  of  the  density 
function  is  Indexed  only  by  p  and  q,  and  so  choosing  an  appropriate 
estimate  is  equivalent  to  choosing  appropriate  values  of  p  and  q. 

All  density  estimation  methods  have  a  problem  similar  to 
the  one  described  above.  Typically  a  class  of  estimates  is 
indexed  by  a  parameter,  often  referred  to  as  a  smoothing  para¬ 
meter,  and  a  suitable  value  of  this  parameter  must  be  chosen  in 
order  to  arrive  at  a  final  estimate  of  f(*).  For  example,  when 
employing  a  kernel  density  estimator 


a  suitable  choice  of  the  smoothing  parameter  (or  window  width) 
h  must  be  made.  Duln  (1976)  and  Hermans  and  Habbema  (1976)  have 
proposed  a  modified  maximum-likelihood  approach  to  the  problem 
of  choosing  h.  In  addition,  Silverman  (1980)  has  suggested  a 


method  for  choosing  the  window  width  based  on  the  so  called  test 
graph  theorem.  The  problem  of  choosing  a  smoothing  parameter 
also  arises  in  a  method  proposed  by  Wahba  (1977).  Wahba's  esti¬ 
mator  is  (for  x  e  [0,1]  and  n  even) 


-  ni 

v--n/2  1+X(2irv) 


2ir±vx 


which  is  seen  to  be  a  Fourier  series  estimator  to  which  a  low-pass 

a 

filter  has  been  applied.  The  smoothing  of  f(»)  is  accomplished  by 
varying  the  parameter  X  rather  than  the  truncation  point  of  the 
Fourier  series  as  in  the  method  of  Kronmal  and  Tarter.  Wahba  (1978) 

A 

chooses  X  so  that  the  estimated  MXSE  of  f(*)  is  a  minimum.  The  pro¬ 


blem  we  have  discussed  is  shared  even  by  the  primitive  histogram 
estimator,  whose  smoothing  parameters  are  the  number  and  size  of 
its  class  Intervals. 

In  ABMA  density  estimation,  the  pair  of  values  (p,q)  may  be 
regarded  as  the  smoothing  parameter.  In  the  remainder  of  this  . 
chapter  two  different  methods  for  choosing  this  parameter  will  be 
presented.  In  the  first  method  we  propose  the  use  of  the  S-array 
for  choosing  (p,q),  since  the  ABMA  (p,q)  representation  for  f(*) 
is  equivalent  to  assuming  that  ($(v)}  t  L(p,A)  for  v  >  q.  In  the 
second  method  (p,q)  is  chosen  in  such  a  way  that  the  estimated 


MISS  of  f  (*)  is  minimized  over  a  suitably  restricted  subclass 
P*9 

of  ABMA  estimates. 


5 . 2  S-Array  Method  of  Selecting  p  and 


As  proven  in  Chapter  II,  the  assumption  chat  a  function  f(’) 


has  an  ARMA  (p,q)  representation  is  essentially  equivalent  to 
the  asstaaptlon  that 

(<Kv)}  e  L(p,A)  for  v  >  q. 

Therefore,  given  estimated  Fourier  coefficients  4(1),  4(2) . 4(M),a 

natural  way  of  selecting  (p,q)  is  to  examine  an  S-array  (see  Gray, 

Kelley,  and  Mclntire  (1978))  composed  of  values  S  (S (tn)e^tnx)  (xe[-it,ir 

n 

a  £bbc 

S  ,(4(m)e^;  *  c,  ,  m  >  q'  and 

P  1  — 

S  ,(4(m)eimx)  *  c,  ,  m  _<  -q'  -  1 

p  i  — 

supports  the  choice  of  (p',q')  for  the  smoothing  parameter  (p,q)  in 

the  sense  that  such  a  pattern  supports  the  existence  of  a  similar 

imx 

pattern  in  the  S-array  based  on  Sq(4(o)  e1™*) . 

Some  experience  with  simulated  data  has  shown  that  even  when 
a  good  constancy  pattern  exists  in  the  parametric  S-array,  the 
sample  S-array  tends  to  be  more  noisy  than  arrays  encountered  in 
time  series  applications.  The  method  of  selecting  (p,q)  discussed 
above  must  undoubtedly,  then,  involve  a  good  deal  of  subjectivity. 

For  this  reason  the  S-array  should  be  regarded  as  a  tool  for  pointing 
out  a  restricted  class  of  candidate  ABMA  estimates.  Additional  analy¬ 
sis  may  be  performed  on  the  restricted  class  of  estimates  to  deter¬ 
mine  a  final  estimate  of  f(*). 

One  possibility  for  arriving  at  a  final  estimate  would  be  to 
perform  some  sort  of  smoothing  in  the  S-array  columns  where  con¬ 
stancy  patterns  are  apparent.  Tukey  (1978)  has  suggested  the  use 
of  his  3R SSS  smooching  procedure  as  a  means  of  making  noisy  patterns 
in  the  S-array  more  Informative.  After  smoothing  competing  columns 


a  choice  for  (p,q)  nay  become  obvious.  A  second  possibility  for 
obtaining  an  estimate  would  be  to  estimate  the  MISE  for  each 
candidate  in  the  restricted  class  of  ABMA  estimates,  and  then 
choose  that  estimate  which  minimizes  the  estimated  MISE.  A 
procedure  for  estimating  MISE  is  discussed  in  the  next  section. 

In  Chapter  VI,  the  S-array  method  of  selecting  (p,q) 
will  be  exemplified  in  the  analysis  of  two  different  data  sets. 

The  smoothing  procedure  discussed  previously  has  not  been  investi¬ 
gated,  but  we  do  examine  the  estimated  MISE  criterion. 

5.3  MISE  Criterion  for  Selecting  p  and  q 

Ideally  we  would  like  to  choose  an  estimator  from  the  ABMA 
class  which  satisfies  some  optimality  criterion  with  respect  to 
f(').  A  criterion  which  is  common  in  the  estimation  of  probability 
density  functions  is  to  seek  an  estimator  which  minimizes  the  MISE. 
In  ABMA  density  estimation  this  entails  choosing  (p,q)  such  that 


MISE(f  )  -  E[/W(f  (x)  -  f(x))2d x] 


p.q 


-x 


p.q 


is  minimized.  This,  however,  is  an  impossible  task  since  the 
optimal  value  of  (p,q)  depends  on  f(’),  the  function  which  is  to 
be  estimated.  Therefore,  given  data  x^,...,xq,  our  approach  will 

A 

be  to  choose  as  our  estimate  of  f(*)  that  f  (•)  for  which  an 

p.q 

/s 

estimated  MISE(f  )  is  minimized.  In  the  remainder  of  this 

p.q 

section  we  discuss  the  problem  of  estimating  MISE  (f  ). 

p.q 


Consider 


/*( f  (x)-f(x))Zdx  -  J* f2  (x)dx-2/fff  (x)f(x)dx 

»  n  .  *  n  '  n .  fi 


-x 


p.q 


lx 


-X 


P.q 


+  fV f2(x)dx. 


From  this  expression  ic  is  clear  that  the  value  of  (p,q)  which 
minimizes 


J(f  )  -  E[/  f  (x)dx  -  2 /  f  (x) f (x) dx] 

n  .  n  9  n .  n  •  n  n 


also  minimizes  MISE  (f  ) .  It  is  therefore  sufficient  to  con- 

p.q 

slder  only  the  estimation  of  J(f  ).  This  observation  greatly 

p»q 

simplifies  our  problem. 

Recalling  relationship  (3.6)  we  have 

J(W  '  E[f/oWX>dX  +  2^0,p4q(x)Sp,q(x>dx 


+  r sl  -  2 f"fn  ~->>f(x>dx  -  2/V  (x)f(x)dxl 


llr P’q 


i/O.p+q 


*  J(f0  +  J  +  2Ef FSn  (x)dxl 

o,p+r  P»R  —1r  o.p+q  p»q 

A 

Now  J(fn  )  has  a  particularly  simple  form  in  terms  of  <p  (1> » . . . , 
u,p+q 

<fr(p+q)  for  which  there  is  an  unbiased  estimator.  We  have 

p  I  g 

J«oW  ■  E|ir(1+2v£il,<v)|2)]  ‘  2££oW*)fl tx>dI 

■  e(jH1+2  |2)] "  »(1+2^,*<v’  |2)- 

Since  E(~j  U(v)|2  -  ~j)  *  |<Kv)[2,  it  follows  that  an  unbiased 


estimate  of  J(f„  is 

o.p+q 


5<£0.pV  •  2i(2  +  2^li<V)  |2)  -  ^  +  I2  -  £)] 

A  A 

Interestingly,  it  is  seen  that  the  first  term  of  J(fn  ,  _)  decreases 

u,p+q 

as  p+q  increases,  but  that  the  second  term  penalizes  an  increase 
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in  p  +  q.  Thus,  J(frt  )  is  sensitive  to  both  the  fidelity 

0,p+q 

and  stability  of  f„  In  addition,  it  is  easily  verified 

U,p+q 

A  A  A  A  A 

that  J(f.  -  J(fn  )  is  the  same  estimate  of  HISE  (fA  _  , ,) 

u ,  mrx  u ,  tn  u ,  m+j. 

-  MISE(frt  )  as  that  derived  by  Kronmal  and  Tarter  (1968).  This 
u,m 

fact  points  out  a  correspondence  between  the  optimal  stopping 

rule  of  Kronmal  and  Tarter  and  J(f.  ) . 

u  ,m 

A  A 

An  unbiased  estimator  of  the  last  term  of  J(f  )  is 

p,q 

2  f*  *0  o+fl(x)gn  fl(x)dx* 

U,p+q  p,q 


-ir 


which  is  zero  whenever  the  estimate  f  (*)  satisxies  condition  S. 

P.q 

This  leaves  us  with  the  problem  of  estimating 


J(L  J  -  E[/*g*  (x)dx]  -  2E[fL  (x)dF(x)  ] . 


p.q 


-IT 


p.q 


-IT 


p.q 


•  IT A 2 

Since  an  unbiased  estimator  of  E [/  g  (x)dx]  is  /  g  (x)dx, 

.  -TT  p,q  -IT  P.q 

we  focus  our  attention  on  E[/  g  (x)dF(x)].  To  estimate  this 
quantity  we  propose  the  use  of  the  bootstrap  mechanism  of  Efron 
(1979). 

In  order  to  illustrate  the  bootstrap  in  this  setting,  let 
X  -  (X^,...,Xn)  denote  a  random  sample  from  f(*).  Further,  let 

A  A 

g  _(*)  indicate  that  g  (•)  is  based  on  the  sample  X,  and 

p.q.x  P.q 

write 

R(X,F)  -  rl  (x)dF(x) . 


-it 


p.q.x 


If  (x. , ...,x  )  is  a  realization  of  X  with  corresponding  empirical 
In 

cdf  F  (•)>  then  the  bootstrap  estimate  of  E[R(X,F)]  is 
n 

E[R(X*,F  )]  -  e£  l  g  _* (*.)], 

n  n  3-1  p , q , X*  j 


4 


i 
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where  X*  Is  a  random  sample  from  F  (*).  This  estimate  is  seen 

n 

to  be  Fisher  consistent,  or  in  other  words,  the  estimate  is 

equal  to  the  parameter  it  estimates  when  F  (*)  ■  F(*)- 

n 

As  it  is  not  possible  to  analytically  evaluate  E[R(X*,F  )], 

n 

Efron  suggests  that  numerous  samples  X*  be  generated  from  F  (•) 

in  order  to  empirically  evaluate  the  expectation  to  a  close  aoproxi- 

mation.  In  our  application  this  procedure  would  be  prohibitive 

since  E[R(X*,F  )]  must  be  evaluated  for  numerous  different  candi- 
n 

date  estimators,  f  (•).  Fortunately,  Efron  also  derives  a  second 

p,  q 

order  approximation  to  E[R(X*,F  )]  by  expanding  R(P*)  *  R(X*,F  )  in 

Q  n 

a  Taylor  series  about  —  (1,1,..., 1)  where  P*  »  (P*,...,P*>  and 

n  In 

P*  *  —(number  of  X*  's  which  equal  x.).  (R(X*,F  )  depends  on 

in  j  in 

X*  only  through  P*  since  R  is  symmetric  in  the  X*'s.) 


Wong  (1979)  has  derived  an  explicit  expression  for  Efron's 

approximation  to  EfR(X*,Fn)J  which  he  shows,  in  fact,  to  be  a 

jackknife  approximation  to  the  bootstrap.  We  have 

n 


E[R(X*,Fn)] 

where  ■  (x^,. 

implies  that 


■r») ' 


(n-l)R(x,F  ) , 
n 

x  ) .  In  our  problem  this 


E[R(X*,F  ) ] 
n 


n  ,  n  * 

Z  g 

ni«l  p»9*x 


‘  Z 

J-l 


(j) 


(x^)  - 


(n-l) 


n  » 

n  i-l8p' 


An  even  simpler  approximation  is  possible  by  noting  that  (for  n 

A  A 

reasonably  large)  g  (t)  a  g  (t)  except  possibly  for 

P*q,X( j)  p,q,x 

t  in  a  neighborhood  of  x^ .  This  observation  gives 
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The  considerations  of  Chapter  IV  indicate  that,  if  M  is  chosen 
to  be  a  function  of  the  sample  size  n,  it  would  be  reasonable 
to  have  M  ■  o(^n)  . 

In  the  next  chapter,  we  will  investigate  the  A  AEMA 
estimator  by  means  of  simulated  data. 


I 


CHAPTER  VI 


AKMA  DENSITY  ESTIMATION  IN  PRACTICE,  AND  A  SUMMARY 

6.1  Introduction 

In  this  final  chapter  the  use  of  ARMA  representations 
In  density  estimation  is  exemplified  with  the  aid  of  both  real 
and  simulated  data.  In  Sections  6.2  and  6.3  two  data  sets 
which  have  appeared  previously  in  the  literature  are  considered. 
The  LRL  data  of  Good  and  Gaskins  (1980)  and  the  Maguire  data  of 
Maguire,  Pearson,  and  Wynn  (1952)  are  analyzed,  and  density 
estimates  are  obtained  using  the  results  of  Chapter  V.  The 
effectiveness  of  the  estimated  MISE  criterion  for  choosing  p 
and  q  is  evaluated  in  Section  6.4  by  means  of  simulated  data. 

The  results  of  the  simulation  study  show  the  criterion  to  be 
quite  effective  in  distinguishing  between  density  estimates 
which  have  important  differences  in  ISE. 

Section  6.5  is  devoted  to  summarizing  the  density  estima¬ 
tion  results  obtained  in  this  work.  In  addition,  some  areas  for 
future  research  in  ARMA  density  estimation  are  indicated. 

6.2  The  LRL  Data 

Good  and  Gaskins  (1980)  have  analyzed  a  data  set,  which 
they  call  the  LRL  (Lawrence  Radiation  Laboratory)  data,  con¬ 
sisting  of  "n  ■  25,752  events  from  a  scattering  reaction".  The 
data  are  recorded  in  the  paper  of  Good  and  Gaskins  in  the  form 


of  a  frequency  table  made  up  172  bins  of  width  10  MeV  each. 

The  ith  bln  Includes  n^ events,  and  the  bins  are  centered  at 
the  values  (in  MeV) 

y±  -  285  +  10(1-1),  i  -  1,2,..., 172. 

In  the  analysis  to  follow  we  consider  the  transformed 

data 

*i  "  IT20(2yi  "  2280)  '  1  "  1,2 . 172* 

The  Fourier  coefficients  $(v) ,  v  -  1,2,...,  associated  with  the 

density  f(«)  of  the  transformed  data  are  estimated  by 
,  172 

«Kv)  -  —  I  n.e  1Vxj  ,  v  -  1,2,...  .  (6.1) 

n  j-1  J 

The  aim  of  Good  and  Gaskins  in  analyzing  the  LRL  data  was 
to  obtain  an  estimate  of  the  underlying  probability  density  by 
using  their  maximum  penallzed-llkellhood  method  (see  Good  and 
Gaskins  (1971)),  and  to  then  describe  a  procedure  for  assessing 
the  likelihood  that  a  bump  found  in  the  estimate  is  also  present 
in  the  underlying  density.  Our  purpose  in  analyzing  the  LRL 
data  is  to 

(1)  illustrate  the  cogent  information  contained  in  the 
S-array  about  the  type  of  ARMA  estimate  which  should 
be  fit,  and 

(11)  to  obtain  an  estimate  comparable  to  that  of  Good 
and  Gaskins . 

In  this  example,  the  estimated  MISE  criterion  for  choosing  (p,q) 


Is  considered  only  for  estimates  with  p  ■  0,  as  some  modifi¬ 
cation  of  the  jackknife  approximation  to  the  bootstrap  is  needed 
for  grouped  data. 

Table  6.1  shows  a  portion  of  the  S-array  for  the  sequence 

in'*  A 

{(-1)  $(m)},  where  the  $(m)  are  as  defined  in  (6.1).  The  array 
based  on  {(-l)m^(m)}  has  been  tabled  since  it  shows  a  much  clearer 

A 

constancy  pattern  than  does  the  array  based  on  {$(m)}.  This 
behavior  is  caused  by  the  fact  that, 

as  will  be  seen  shortly,  estimates  of  the  density  f ( •)  have 
considerably  more  "power"  near  x  ■  0  than  near  x  *  ir. 

The  constancy  apparent  in  the  first  two  columns  of  the 
array  in  Table  6.1  gives  clear  preference  to  ARKA  estimates  with 
p  »  1  or  p  ■  2.  However,  a  fuller  understanding  of  the  Informa¬ 
tion  contained  in  Table  6.1  can  be  gained  by  initially  consl- 

A  A 

dering  the  estimates  f^  g(*)  and  fj  q(*)»  which  are  plotted  in 
Figures  6.1  and  6.2  respectively.  From  these  two  figures  it 
is  clear  that  the  constancy  in  the  first  column  of  the  S-arTay 
corresponds  to  a  bump  (in  the  terminology  of  Good  and  Gaskins) 
at  about  x  *  -1.40,  and  the  constancy  in  the  second  column 
corresponds  to  this  same  bump  and  another  smaller  bump  at  about 
x  ■  .55.  Interestingly,  ( • )  is  virtually  the  same  estimate 

as  that  obtained  by  Good  and  Gaskins  except  for  the  presence  of 
11  additional,  very  small  bumps  in  their  estimate. 

An  area  for  future  research  is  establishing  a  method  of 
transforming  the  original  sequence  of  estimated  Fourier  coeffi- 


LRL  DATA  S-ARRAY  FOR  {(-1) >(v)> 


cients  in  such  a  way  Chat  Che  dominating  effect  of  major  peaks 
is  filtered  out.  In  the  current  example,  such  a  method  would 
allow  us  to  remove  the  effect  of  the  two  major  peaks  (seen  in 
Fig.  6.2)  so  that  the  possible  presence  of  smaller  peaks  could 
be  carefully  Investigated. 

In  the  absence  of  a  suitable  filtering  technique,  we 

arrive  at  a  final  estimate  of  f(*)  by  choosing  a  Fourier  series 

estimate  f.  (•)  which  satisfies  m  e  (1,2,..., 50}  and 
u,m 

J(f0,m)  <  J(fQjk).  for  k  -  1,2,..., 50. 

Mote  that  50  is  certainly  not  too  large  a  truncation  point  to 
consider  in  this  case  because  of  the  extremely  large  size  of 
Che  sample. 

A  A 

The  minimum  value  of  J(fg  k)  (for  k  ■  1,2,..., 50)  occurs 
at  k  -  42.  The  estimate  fQ  ^2(*)  ia  Plotted  ln  figure  6.3  and 
nine  of  the  thirteen  bumps  of  Good  and  Gaskins  are  Identified 
(using  their  numbering  scheme) .  Our  much  simpler  analysis  seems 
to  have  arrived  at  essentially  the  same  results  as  those  of  Good 
and  Gaskins,  although  collaboration  with  a  subject  matter  expert 
would  be  essential  to  correctly  interpret  differences  in  the  esti¬ 
mates. 

As  a  final  observation  concerning  the  LRL  data  we  point 
out  the  similarity  of  f2  Q(*)  and  fg>42(0,  whlch  is  striking  when 

A  A  A 

one  considers  that  f2  Q(*)  ia  based  only  on  $(1)  and  $(2).  The 
paucity  of  parameters  required  for  f2  ,.j(*)  to  correctly  describe 
the  major  features  of  the  LRL  data  becomes  important  in  smaller 
samples.  This  fact  will  be  illustrated  ln  Section  6.4. 


6.3  The  Maguire  Data 


The  data  set  to  be  analyzed  In  this  section  appears  In 
Carmichael  (1976)  and  has  been  studied  by  Maguire,  Pearson 
and  Wynn  (1952),  Boneva,  Kendall  and  Stefanov  (1971),  and,  in 
a  density  estimation  context,  by  Carmichael  (1976).  The  data, 
which  we  shall  refer  to  as  the  Maguire  data,  consists  of  109 
"time  intervals  in  days  between  explosions  in  mines  involving 
more  than  ten  men  killed,  from  December  6,  1875  to  May  29,  1951. 
For  our  purposes,  the  109  values  will  be  regarded  as  independent 
realizations  of  a  random  variable  whose  density  function  we  wish 
to  estimate. 

An  initial  look  at  a  histogram  of  the  Maguire  data  indi¬ 
cates  the  possibility  of  an  underlying  exponential  type  density. 
Therefore,  since  Fourier  series  approximation  methods  work  best 
for  functions  whose  tails  are  similar,  we  have  employed  what 
Carmichael  refers  to  as  symmetrization.  Symmetrization  entails 
transforming  the  original  data  to  the  interval  [0,ir],  and  then 
estimating  the  density  f*(*)  of  the  transformed  data  by  first 
estimating 

f(x)  ■  -|f*(jxj),  x  c  [  -ir,ir]. 

The  evenness  of  f(*)  implies  that 

$(v)  ”  /^(coxvx  -  isinvx)  f(x)dx 

-7T 

IT 

■  2  /  cosvx  f(x)dx 

0 

■  coxvx  f*(x)dx,  |v|  ■  0,1,2,...  . 

0 


103 


Given  the  previous  expression  for  <Kv) ,  we  form  estimates 
1  109 

♦(v)  -  rxx-  Z  cos  vx  ,  | vj  -  0,1,2,...  , 
j-1  2 

of  ♦(v) ,  where  x^,  j  ■  1,...,109,  is  the  Maguire  data  rescaled 
to  the  interval  [0,x].  (The  transformation  x^  *  was 

used  since  1630  was  the  longest  number  of  days  between  explo¬ 
sions.)  Estimates 

f*  (x)  -  2f  (x)  ,  x  e  [0,ir] , 

P.q  p.q 

of  the  density  f*(*)  may  then  be  constructed,  where 

U  +  2Real{e  (F  (x) )  -  F_(x)>] 
p,q  p  q  0 

and 

F  (x)  -  Z  $(v)eivx. 

2  v-k 

A  portion  of  the  S-array  for  {(-l)V$(v))  is  given  in 

A 

Table  6.2.  Note  that,  since  in  this  case  $(v)  is  real-valued, 

the  S-array  is  also  real-valued.  Based  on  Table  6.2,  f.  „(•) 

1*U 

A 

and  f^  2^*)  seem  to  be  the  best  supported  estimates  of  f(»). 
and  certainly  no  estimate  f  ^(»)  for  P  >  ^  is  supported.  For 
purposes  of  the  MISE  critierion  of  Chapter  V,  this  pattern  in 
the  S-array  suggests  that,  among  estimates  f  (•)  (  p  >  0) ,  we 

p.q 

limit  consideration  to  estimates  with  p  ■  1. 

In  order  to  objectively  choose  an  estimate  of  f*(*), 

A  A 

we  shall  calculate  J(f*  )  for  each  estimate  in  the  class 

p.q 

A  ■  {f*  (•):  p  ■  0,1  and  p  +  q  <  20). 

p.q  — 

Since  4(v)  is  estimated  differently  here  than  in  previous  chapters, 

A  A 

the  estimate  J(f  )  derived  in  Chapter  V  must  be  modified 

p.q 


1 
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TABLE  6-2 

MAGUIRE  DATA  S- ARRAY  FOR  {(-l)V$(v)} 


o/n 

1 

2 

3 

4 

5 

6 

-6 

-2.3442 

4.1873 

7.6413 

.6243 

31.5796 

.5846 

-5 

-2.6173 

2.2878 

-1.1070 

-24.0702 

25.8274 

-3.2688 

-4 

-2.5935 

-1.9669 

54.7387 

-27.1260 

82.1131 

-4.2798 

-3 

-2.6195 

-57.7808 

106.6615 

-13.2873 

-1.5643 

6.9471 

-2 

-2.2463 

2.3738 

1.9639 

-4.6324 

-6.4719 

4.8715 

-1 

-2.2317 

81.9580 

3.3973 

-9.9555 

-8.5664 

50.7821 

0 

-1.8119 

1.8528 

-1.1989 

1.0701 

-1.2228 

1.2530 

1 

-1.8024 

30.1227 

-  .7049 

2.6439 

-1.0416 

-1.3733 

2 

-1.6175 

1.5569 

-1.5230 

1.7872 

-  .4512 

6.5955 

3 

-1.6275 

.6980 

12.5592 

4.2834 

-5.1103 

3.7425 

4 

-1.6183 

-12.1080 

4.3602 

3.4221 

68.0605 

2.8217 

5 

-1.7439 

4.6512 

-3.5795 

-4.6306 

19.4732 

-.4310 

aaytt ~  ~7 
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•lightly .  V e  have 

MISE(f *  )  -  E[/ir(f*  (x  )  -  f*(x))2dx] 

P»q  0  p,q 

and 

J(f*  )  -  E{/[f*  (x)]2dx  -  2/* f*  (x)dF*(x) } . 

p » <i  g  p.q  q  p.q 


Proceeding  as  In  Chapter  V  ,  it  may  be  verified  that  the 

A 

appropriate  estimate  of  J(f*  )  is 

P.q 

J(f*  o)  “  '  +  '2f?~l}^2(v)]  +  t(l  1)^ 

p,q  ir  (n-1)^  ,rU'1Vl 

and 


J(f*  >  -  J(f*  )  +  4/ 
p.q  0,p+q  i 


4 

8b  a(x)dx  ~  n 

p.q  O 


n  . 
*  8, 
J-l 


p.q.* 


(x  } 

(J)  3 


+  8/fff  (x)g  (x)dx,  p  >  0  . 
0  0, p+q  P»q 


A  A 

Table  6.3  contains  the  value  of  J(f*  )  for  each  of  the 

p.q 

A  A 

estimates  In  A,  and  shows  the  minimum  of  J(f*  )  to  occur  at 

p.q 

f*  12(*).  This  estimate  is  plotted  in  Figure  6.4,  and,  for 

A  A  A 

the  sake  of  comparison,  the  estimate  f*  (at  which  J(f* 

is  minimized)  is  plotted  in  Figure 6. 5.  the  estimate  f*  10(*) 
is  seen  to  be  smoother  in  the  tall  than  is  fg  a  feature 

which  we  noted  to  be  a  characteristic  of  ARMA  approximators. 
Whether  or  not  the  extra  smoothing  done  by  fj  ^q(*)  is  warranted 
might  best  be  judged  by  someone  knowledgeable  with  the  physical 
situation  which  generated  this  data. 

An  interesting  aspect  of  Table  6.3  is  the  magnitude  of 

A  A  A  A 

J(f?  _)  relative  to  the  minimum  value  of  J(f*  ).  A  comparison 

1.0  p.q 

of  these  two  numbers  confirms  that  a  low  frequency  component  is 


TABLE  6 . 3 


VALUES  OF  J(f*  )  FOR  THE  MAGUIRE  DATA 
P*9 


k 

1 

-.7360 

-1.1907 

2 

-1.0037 

-1.1939 

3 

-1.1021 

-1.1807 

4 

-1.1363 

-1.1762 

5 

-1.1449 

-1.1687 

6 

-1.1468 

-1.1828 

7 

-1.1502 

-  .9564 

8 

-1.1695 

-1.1538 

9 

-1.1805 

-1.2262 

10 

-1.2075 

-1.1659 

11 

-1.2179 

-1.2266 

12 

-1.2291 

.2571 

13 

-1.2250 

-1.2215 

14 

-1.2205 

-1.1965 

15 

-1.2141 

-1.2155 

16 

-1.2085 

-1.2176 

17 

-1.2039 

-1.2222 

18 

-1.2057 

-1.2138 

19 

-1.2091 

-1.1927 

20 

-1.2106 

-1.1490 
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the  dominant  feature  of  the  Maguire  data,  a  fact  which  was 
predicted  by  the  first  column  of  the  S-array. 

A  final  comment  about  this  data  set  concerns  the  ability 

A  A 

of  J(f*  )  to  distinguish  between  differing  estimates.  Figure 
Pt«l 

A 

6.6  shows  a  plot  of  the  estimate  f*  ,(•),  which  has  the  third 

1,0 

A  A  A 

largest  value  of  J(f*  )  in  Table  6.3*  Mote  that  f?  ,(•)  does 

p,q 

not  have  its  maximum  at  zero,  which  is  true  of  only  one  of  the 

other  estimates  considered.  The  other  estimate  of  which  this 

is  true,  f*  ,,(•),  maximizes  J(f*  )  and  is  such  that  f?  ,  .(0)“ 
1,11  p,q  1,11 

A  A 

-11.87.  This  is  evidence  that  the  criterion  J(f*  )  is  able  to 

P*9 

identify  the  poorer  estimates  of  f*(«). 

6.4  A  Simulation  Study 

The  purpose  of  this  section  is  to  investigate 
(1)  the  effectiveness  of  the  MISE  criterion  for 
selecting  (p,q)  in  distinguishing  between 
estimates  which  have  important  differences 
in  1SE,  and 

(11)  the  possible  savings  in  ISE  which  may  be 
attained  by  using  AKMA, rather  than  Fourier 
series,  density  estimation. 

In  order  to  accomplish  the  above,  simulations  (which  will  here¬ 
after  be  referred  to  as  Simulations  1,  2,  and  3)  involving 
three  different  density  functions  have  been  carried  out.  A 
description  of  these  simulations  follows. 

Simulation  1.  In  this  study,  25  Independent  random 


samples,  each  of  size  100,  were  generated  from  the  Beta  (12,3) 
distribution  using  the  IMSL  subroutine  GGBTR.  The  data 
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and 


{^0,q(*):  q  "  1»2, . . . ,10} 

A22  U  {^l,q(*):  q  "  0,1 . 4}  * 


For  each  of  these  three  estimates,  ISE(f"  )  was  evaluated. 

P»<1 

Simulation  3.  In  this  final  simulation,  15  indepen¬ 
dent  random  samples  of  size  100  were  generated  from  the  density 
(pictured  in  Figure  6.7) 

fm(x>  -  +  2Real{e2(F4(x>»]  Ij.^j  (*> . 


where 


4  * 


F,(x)  -  I  4(v)eivx  and  U(v)} 
4  v-1 


is  the  sequence  of  estimated  Fourier  coefficients  from  the  LRL 
data.  The  samples  were  generated  as  In  Simulation  2,  although 
in  this  case  values  of  the  Inverse  cdf  had  to  be  evaluated 

A  A 

numerically.  For  each  sample,  J(f’M  )  was  calculated  for  each 

P»q 


estimate  in 


A31  •  P  ”  0.1*2  and  p  +  q  jc  10}, 


and  the  A^  and  A^  ARMA  estimates  were  identified,  where 

A32  •  (f^^C*):  q  •  1,2,. ..,10)  . 

(Reasons  for  considering  the  classes  of  estimates  defined  in  this 

and  the  previous  simulation  are  discussed  below.)  Finally, 

* 

ISE(f"'  )  was  calculated  for  these  two  estimates. 

p.q 

The  results  of  Simulations  1,2,  and  3  are  suanarlzed  in 
Tables  6.4  -  6.6.  In  order  to  define  some  descriptive  measures 
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FIGURE  6.7 

Density  Function  fM,(*) 


1 


DESCRIPTIVE  STATISTICS  FOR  SIMULATIONS  1,  2  AND  3 
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RESULTS  OF  TESTS  OF  THE  HYPOTHESES 


savings  in  I5E  results  from  using  ARMA  density  estimation  rather 
than  Fourier  series  density  estimation.  The  three  hypotheses 
tested  are 


and 


‘or 

^l’1 

vs.  Hjj.:  R^  <  1 

!02: 

R23  "  1 

vs.  H12:  R23  <  1 

H03:  R31 


vs. 


H, 


13  * 


R31  <  X* 


The  test  statistic  for  vs  is  the  Wilcoxon  signed-rank 

statistic 


Wij  “  .  I*  r(l7ijkl)ICO,«)(7ijk)’ 
ij 


where  r(<)  denotes  rank.  The  reason  for  the  use  of  the  log 
transformation  is  that,  since  typically  the  distribution  of  ISE 
is  skewed. 


~ISE(W 


-  ^n[iSE(fijk)]  -  £n[iSE(f12k)] 


is  more  nearly  symmetrically  distributed  about  zero  under 

A  A 

than  is  ISE(f^jk)  -  ISE(f ^2k)  about  its  mean.  This  is  an  important 
consideration  since  the  Wilcoxon  test  is  based  on  an  assumption 
of  symmetry.  The  results  of  the  above  tests  are  indicated  in 
Table  6.5  by  P  values  which  are  defined  by  *  P(W+  <_  w^) 
where  W+  is  a  random  variable  having  the  distribution  of  a 
Wilcoxon  signed-rank  statistic.  In  addition,  95Z  confidence 
intervals  for  the  parameters  R^  are  given. 

The  results  in  Table  6-5  address  the  second  of  the  two 
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considerations  which  were  the  intent  of  our  Investigation  at 
the  beginning  of  this  section.  These  results  are  strong  evi¬ 
dence  that  a  savings  in  ISE  is  realized  if  the  MISE  criterion 
of  Chapter  V  is  allowed  to  choose  from  a  class,  of  AKMA  (p>0, 
q  >  0)  estimates  rather  than  a  class  containing  only  Fourier 
series  estimates.  It  is  important  to  note,  though,  that  the 
results  obtained  are  conditional  on  the  particular  densities 
considered,  the  sample  sizes  used,  and  the  classes  of  AKMA 
estimates  chosen  for  consideration.  Perhaps  the  most  important 
of  these  three  points  is  the  choice  of  a  class  of  estimates. 

4*  A 

The  estimates  and  in  Table  6.4  indicate  that  the  number 
of  AKMA  estimates  in  the  chosen  class  can  be  an  important  con¬ 
sideration.  Further,  it  is  not  clear  how  the  results  of  Tables 
6.4  and  6.5  would  have  been  affected  if  the  classes  A2^>  Ajy 
and  had  included  estimates  with  larger  values  of  p.  The 
restriction  of  the  size  of  A^  and  A^  was  motivated  by  the 
fact  that,  in  some  initial  repetitions  of  Simulation  1  Core-  ' 
vious  to  those  upon  which  Tables 6. 4  and  6.5  are  based),  the 
S-array  for  {(-l)$(v) }  showed  a  good  constancy  pattern  in 
column  1.  A  similar  statement  is  true  regarding  A^  and  Simula¬ 
tion  3,  in  which  case  constancy  was  apparent  in  the  first  two 
columns  of  a  typical  S-array. 

The  first  of  the  two  points  which  were  to  be  investigated 
in  this  section  concerned  the  ability  of  the  MISE  criterion  to 
distinguish  between  estimates  having  important  differences  in 
ISE.  Evidence  of  this  ability  is  given  in  Table  6.6  .  In 
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(1)  The  class  of  Fourier  series  and  autoregressive  density 
estimators  Is  a  subclass  of  the  class  of  ARMA  estimators. 

A 

(2)  The  ARMA  estimator  f  (•)  estimates  an  approximator 

P»  4 

f  (•)  which  was  shown  to  be  related  to  the  e  -transform. 
p,q  n 

This  relationship  Implies  that  f  (x)  is  often  a  better 

p.q 

approximation  to  f(x)  than  is  fn  (x) ,  a  Fourier  series 

U.prq 

approximator. 

A 

(3)  The  estimator  f  (•)  may  be  expressed  in  terms  of  a 

P.q 

quantity  which  was  shown  to  be  an  adaptive,  generalized 
jackknife  statistic. 

(A)  The  mixture  of  densities  having  autoregressive  represen¬ 
tations  Is,  in  general,  a  density  which  has  an  ARMA  repre¬ 
sentation.  This  result  implies  that  ARMA  representations 
often  require  fewer  parameters  to  adequately  fit  a  density 
than  do  autoregressive  representations. 

A 

(5)  In  a  probability  sense,  the  estimator  f,  (•)  possesses 

i.q 

(under  certain  conditions)  a  more  rapid  convergence 
property  analogous  to  that  possessed  by  e^  in  the  deter¬ 
ministic  setting. 

(6)  Two  solutions  were  proposed  to  the  problem  of  selecting 
an  appropriate  estimate  from  the  class  of  ARMA  estimates. 

One  solution  utilizes  the  S-array,  and  In  the  other  solu- 

A 

tlon  an  estimator  Is  sought  which  will  minimize  MISE(f  ) . 

P.q 

(7)  Simulation  studies  indicate  that  (for  the  densities  consi¬ 
dered)  a  savings  in  ISE  results  from  allowing  the  MISE 
criterion  to  choose  from  a  class  of  ARMA  (p  >  0,  q  >  0) 
estimates  rather  than  from  a  class  of  only  Fourier  series 


estimates 


TABLE  6.6 

THE  ABILITY  OF  THE  MISE  CRITERION  TO  DISTINGUISH 
ESTIMATES  WITH  DIFFERENT  VALUES  OF  ISE 


Type  of  Estimate 

P1 

P^ _ 

Au 

88.00 

63.64 

*13 

100.00 

100.00 

A21 

68.00 

58.82 

A23 

64.00 

81.25 

A31 

73.33 

90.91 

Notes: 

I.  P'  is  the  percentage  of 

cases  in  which 

*  A  *  A 

2.  Among  the  cases  satisfying  J(fj^)  1*  JCf^fc 


) ,  P"  is  the 


percentage  of  cases  in  which  the  sign  of  JCf^^)  -  J(f 

A  A 

and  ISEff^j^)  -  ISECf^^)  are  the  same. 


addition,  it  is  noted  that  for  all  three  simulations  the  esti¬ 
mates  which  had  the  larger  values  of  J(f  )  were  consistently 

P*q 

among  the  poorer  (in  terms  of  ISE)  estimates. 

The  final  remarks  to  be  made  in  this  section  concern 

Simulation  3.  In  Section  6.2  It  was  observed  that  the  first 

two  estimated  Fourier  coefficients  of  the  LRL  data  contained 

essentially  all  the  information  about  the  main  features  of 

chat  data  set,  a  fact  that  is  not  detected  by  Fourier  series 

estimates,  fn  ( •) .  With  this  in  mind,  one  of  the  alms  of 
U,q 

Simulation  3  was  to  illustrate  that,  if  moderate  sized  samples 
were  generated  from  a  density  like  that  of  the  LRL  data,  a 
parsimonious  ARMA  estimate  would  be  preferred  to  a  Fourier 
series  estimate.  That  this  is  the  case  is  evidenced  by 
Table  6.5  and  the  average  number  of  Fourier  coefficients, 

N(^3j) ,  used  by  the  A^  ARMA  estimates  of  Simulation  3.  We 
have 

N(A3i>  -  4  and  N(A32)  -  5.47, 

and  thus  the  considerable  savings  in  ISE  obtained  using  the  A^ 
ARMA  estimate  occurred  even  though  the  ARMA  estimates  were,  on 
the  average,  based  on  fewer  fitted  parameters  than  the  Fourier 
aeries  estimates. 

6.5  A  Summary 

A  new  class  of  estimators  of  a  probability  density  func¬ 


tion,  referred  to  as  the  class  of  ARMA  estimators,  has  been 
introduced  in  this  work.  The  principal  results  obtained  con¬ 
cerning  this  class  of  estimators  may  be  summarized  as  follows. 


Although  some  important  results  have  been  obtained  in 
this  work,  there  remain  numerous  topics  for  future  research 
in  ARMA  density  estimation.  Some  of  these  topics,  such  as  the 
establishment  of  more  general  large  sample  properties,  the 
routine  choice  of  a  class  of  estimates,  and  further  investi¬ 
gation  of  the  problem  of  selecting  p  and  q,  have  been  alluded 
to  previously.  However,  perhaps  the  most  important  area  for 
future  research  is  a  large-scale  comparison  of  ABMA  density 
estimation  to  other  common  methods  of  density  estimation. 

Even  though  new,  different  methods  of  viewing  an  old  problem 
are  of  value,  it  is  probably  desirable  to  be  somewhat  economic 
with  regard  to  the  number  of  new  methods  proposed.  For  this 
reason,  before  being  recommended  for  widespread  use  each  new 
method  should  be  validated  against  existing  methods.  A  part 
of  this  validation  for  ABMA  density  estimation  has  been  accom¬ 
plished  in  this  work,  and  the  results  thus  far  obtained  indicate 
the  possibility  of  a  valuable  new  method. 
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